巴西专利BR112013023949A2 transmission length of frame element in audio coding

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Audio Element Length Transmission in Audio CodingStructure elements that should be made available for jumping can be transmitted more efficiently by arranging standard payload length information to be transmitted within a configuration block, with length information within the structure elements, for their part. instead, being subdivided into a standard payload length indicator followed, if the standard payload length indicator is not defined, by a payload length value explicitly encoding the payload length of the respective structure element. However, if the standard payload length indicator is set, an explicit transmission of the payload length can be avoided. Preferably, any element of the structure, the standard extension payload length indicator of which it is defined, has the standard payload length and any element of the structure, the standard extension payload length indicator of which it is not defined. , has a payload length corresponding to the payload length value. By this measurement, the transmission efficiency is high.
公开号:BR112013023949A2
申请号:R112013023949-2
申请日:2012-03-19
公开日:2020-11-10
发明作者:Max Neuendorf；Markus Multrus；Döhla Stefan；Heiko Purnhagen；Frans De Bont
申请人:Fraunhofer-Gellschaft Zur Förderung Der Angewandten Forschung E.V；Dolby International Ab；Koninklijke Philips N.V.；
IPC主号:

专利说明:

. 1/103 Structure Element Length Transmission in Audio Coding Specification The present invention relates to audio coding, such as the USAC Unified Speech and Audio Coding I codec and, in in particular, the transmission of the length of the structural element.
In the past few years, several audio codecs have become available, each audio codec being speci ﬁ cally 10 designed to suit a dedicated application.
Mostly, these audio codecs are capable of encoding more than one audio channel or audio signal in parallel. Some audio codecs are also suitable for encoding different audio content by grouping audio channels or 15 audio objects differently from audio content and subjecting these groups to different audio coding principles. In addition, some of these audio codecs allow extension data to be inserted into the data stream to accommodate future extensions / developments of the audio codec.
20 An example of such audio codecs is the USAC codec, as defined in ISO / IEC CD 23003-3. This standard, named "Information Technologies - MPEG Audio Technologies - Part 3: Audio Coding and Unified Speech", describes in detail the functional blocks of a reference model of a call for 25 proposals on audio coding and unified speech.
Figures Sa and Sb illustrate block diagrams of the decoder and encoder. In the following, the general functionality of the individual blocks is explained briefly. Thus, the problems in
'•.
2/103 putting all parts of the resulting syntax together in a continuous data stream is explained in relation to figure 6.
Figures Sa and Sb illustrate block diagrams of the encoder and decoder. The block diagrams of the USAC encoder 5 and decoder reflect the structure of the USAC MPEG-0 encoding. The general structure can be described as follows: first, there is a common pre / post-processing consisting of a functional MPEG surround unit (MPEGS) to work with stereo or multi-channel processing and an improved SBR unit (eSBR) that works with 10 parametric representation of the highest audio frequencies in the input signal. So, there are two branches, one that consists of an Advanced Audio Coding tool path (AAC Advanced Audio Coding) and the other that consists of a path based on linear prediction coding (LP or LPC domain I linear 15 prediction coding ), which in turn presents a frequency domain representation or a time domain representation of the residual LPC. All spectra transmitted for both AAC and LPC are represented in the MDCT domain following quantization and arithmetic coding. The time domain representation uses an ACELP excitation coding scheme.
The basic structure of MPEG-D USAC is shown in figure Sa and figure Sb. The data flow in this diagram is from left to right, top to bottom. The functions of the decoder are to find the quantized audio spectrum description or time domain representation in the data stream payload and to decode the quantized values and other reconstruction information.
In the case of transmitted spectral information, the • • 3/103 decoder must reconstruct the quantized spectrum, process the reconstructed spectrum using whatever tools are available in the payload of the continuous data stream to arrive at the real signal spectrum as described by the payload 5 of the continuous flow of input data and finally convert the spectrum from the frequency domain to the time domain.
Following the initial reconstruction and scalability of the spectrum reconstruction, there are optional tools that modify one or more of the spectra to provide more efficient coding.
In the case of representation of the transmitted time domain signal, the decoder must reconstruct the quantized time signal, process the reconstructed time signal through any tools that are active in the payload of the continuous data stream to reach the signal of the data domain. real time as described by the payload of the incoming data stream.
For each of the optional tools that operate on the signal data, the option to "pass" is retained and, in all cases where processing is omitted, the spectrum or time samples at its input are passed directly through the tool without modification .
In places, where the continuous flow of data changes its signal representation from time domain to representation from frequency domain or from LP domain to non-LP domain or vice versa, the decoder must facilitate the transition from one domain to another through a window with appropriate transition overlay.
ESBR and MPEGS processing is applied in the same way to both encoding paths after the transition work.
The input for the data stream payload demultiplexer tool is the MPEG-D USAC data stream payload 5. The demultiplexer separates the payload of the data stream in parts for each tool and provides each tool with the payload information of the data stream related to that tool.
The outputs of the continuous data flow payload demplexer tool are: Depending on the type of core coding in the current structure: coded spectrum without quantized noise represented by: scale factor information; arithmetically coded spectral lines.
or: linear prediction parameters (LP linear prediction) together with an excitation signal represented by: arithmetically coded and quantized spectral lines (TCX transform coded excitation); or ACELP encoded time domain excitation.
Spectral noise filling information (optional); The M / S decision information (optional); The temporal noise modeling information (TNS I temporal noise shaping) (optional); The filter bank control information;
TWT time unwarping information (optional); Control information for enhanced spectral bandwidth replication (eSBR) (optional); 5 The MPEG surround control information (MPEGS).
The scale factor noise-free decoding tool obtains information from the continuous data stream payload demultiplexer, analyzes this information and decodes the Huffman and DPCM coded scale factors.
The input for the scale factor noise-free decoder tool is: The scale factor information for the noise-encoded spectrum.
The output of the scale factor noise-free decoder tool is: The entire decoded representation of the scale factors.
The decoder tool without spectral noise obtains information from the payload demultiplexer of the continuous data stream, analyzes this information, decodes the arithmetically encoded data and reconstructs the quantized spectrum. The input for this noise-free decoder tool is: The noise-encoded spectrum.
The output of this noise-free decoder tool is: The quantized values of the spectrum.
The inverse guanting tool obtains the quantized values for the spectrum and converts the whole values to the reconstructed, non-scaled spectrum. This quantizer is an expanded quantizer, whose expanded factor depends on the coding mode of the selected nucleus.
5 The entry for the Inverse Quantizer tool is: The quantized values for the spectrum.
The output of the inverse quantizer tool is: the inverse quantized spectrum, not scaled.
The E_Fill noise tool is used to fill spectral intervals in the decoded spectrum that occur when the spectral value is quantized to zero, for example, due to a strong restriction of data demand in the encoder. The use of the noise fill tool is optional.
The inputs for the noise filling tool are: the inverse quantized spectrum, not scaled; Noise filling parameters; The entire decoded representation of the scale factors.
The outputs for the noise filling tool are: The spectral values inversely quantized, not scaled, for spectral lines that were previously quantized to zero; Modified entire representation of scale factors.
The rescheduling tool converts the entire 5 representation of the scale factors to the actual values, and multiplies the inverse quantized spectrum not scaled by the relevant scale factors.
The inputs for the scale factor tool are: The entire decoded representation of the scale factors; the inverse quantized spectrum, not scaled.
The tool output of the scale factors is: the inverse quantized spectrum, not scaled.
For an overview of the M / S tool, please refer to ISO / IEC 14496-3: 2009, 4.1.1.2.
For an overview of the temporal noise modeling (TNS) tool, please refer to ISO / IEC 14496-3: 2009, 4.1.1.2.
The filter bank block toggle tool applies the inverse of the frequency mapping that was performed on the encoder. A discrete modified cosine reverse transformation (IMDCT) is used for the filter bank tool. The IMDCT can be configured to support spectral coefficients of 120, 128, 240, 256, 480, 512, 960 or
1024.
The inputs for the filter bank tool are: The spectrum (inversely quantized); The filter bank control information.
5 The output (s) of the filter bank tool (s) is The reconstructed audio signal (s) from the time domain.
The block toggle tool I distorted time filter bank replaces the normal block toggle tool I filter bank when time distortion mode is enabled. The filter bank is the same (IMDCT) as for the normal filter bank, in addition, window time domain samples are mapped from the distorted time domain to the linear time domain by rearnostragern with time variation.
The inputs for the distorted time filter bank tools are: The inversely quantized spectrum; The filter bank control information; The distorted time control information.
The output (s) of the filter bank tool is: The reconstructed audio signal (s) from the linear time domain.
The improved SBR (eSBR) tool regenerates the high band of the audio signal. It is based on the replication of harmonic sequences, truncated during encoding. It adjusts the spectral envelope of the generated high band and applies reverse filtering, and adds noise and sinusoidal components to recreate the spectral characteristics of the original signal.
The entry for the eSBR tool is: 5 The quantized housing data; General control data; a time domain signal from the frequency domain core decoder or ACELP / TCX core decoder.
The output of the eSBR tool is either: a time domain signal; or a QMF representation of a signal, for example, is used in the MPEG Surround tool.
The MPEG Surround (MPEGS) tool produces multiple signals from one or more input signals by applying a sophisticated upmix procedure to the input signal (s) controlled by suitable special parameters. In the USAC context, MPEGS is used to encode a multichannel signal, transmitting information from the parametric side next to a transmitted downmix signal.
The entry for the MPEGS tool is: a short-time domain signal; or a representation of QMF vorninio of a reduced signal from the eSBR tool.
The output of the MPEGS tool is: a multichannel time domain signal.
The Signal Classifier tool analyzes the original input signal and generates control information from it that activates the selection of different encoding modes.
The analysis of the input signal is implementation-dependent and will attempt to select the optimal core encoding mode for a given input signal structure. The output of the 5 signal classifier can (optionally) also be used to influence the behavior of other tools, for example, MPEG Surround, improved SBR, time-warped filter bank and others.
The input for the signal classifier tool is: the original unmodified input signal; additional implementation-dependent parameters.
The output of the Signal Classifier tool is: a control signal to control the selection of the core codec (non-LP filtered frequency domain coding, LP filtered frequency domain or LP filtered time domain coding).
The ACELP tool provides a way to efficiently represent a time domain excitation signal by combining a long term predictor (adaptive codeword) with a pulse-like sequence (innovation codeword). The reconstructed excitation is sent through an LP synthesis filter to form a time domain signal.
The entry for the ACELP tool is: adaptive and innovation codebook indexes; gain values of adaptive and innovation codes; other control data;
LPC filter coefficients interpolated and inversely quantized.
The output of the ACELP tool is: The reconstructed audio signal from the 5 time domain.
The MDCT-based TCX decoder tool is used to transform the weighted LP residual representation of an MDCT domain back to a time domain signal and release a time domain signal including weighted LP synthesis filtering. OIMDCT can be configured to support 256, 512 or 1024 spectral coefficients.
The entry for the TCX tool is: The MDCT spectrum (inversely quantized); LPC filter coefficients interpolated and inversely quantized.
The output of the TCX tool is: The reconstructed audio signal from the time domain.
The technology disclosed in ISO / IEC CD 23003-3, which is incorporated by reference, allows the definition of channel elements which are, for example, single channel elements containing only the payload for elements of a single channel or pairs of channel containing payload for two channels or LFE (Low Frequency Improvement) channel elements containing payload for one LFE channel.
Of course, the USAC codec is not the only codec capable of encoding and transferring information into a more complicated audio codec of more than one or two audio channels or audio objects through a continuous stream of data. Therefore, the USAC codec served simply as a concrete example.
Fig. 6 shows a more general example of an encoder and decoder, respectively, both represented 5 in a common scenario where the encoder encodes audio content 10 in a continuous data stream 12, with the decoder decoding the audio content or the at least a part of it, from the continuous data stream 12. The result of the decoding, that is, the reconstruction, is indicated in 14. As illustrated in Fig.
6, audio content 10 can be composed of a number of audio signals 16. For example, audio content 10 can be a spatial audio scenario composed of a number of audio channels
16. Alternatively, audio content 10 can represent a conglomeration of audio signals 16 with audio signals 16 representing, individually and / or in groups, individual audio objects and objects that can be put together in an audio scenario, at the discretion of the decoder user, to obtain reconstruction 14 of the audio content 10 in the form of, for example, a spatial audio scene for a specific speaker configuration. the encoder encodes the audio content 10 in units of consecutive time periods. Such a period of time is shown exemplarily at 18 in Fig. 6. The encoder encodes the consecutive periods 18 of the audio content 10 using the same way: that is, the encoder inserts a structure 20 per period of time into the data stream 12 18. In doing so, the encoder decomposes the audio content within the respective time period 18 into structure elements, the number and meaning / type which are the same for each time period 18 and structure 20, respectively. In relation to the USAC codec described above, for example, the encoder encodes the same audio signal pair 16 over time 18 into a channel pair element of elements 22 of structures 20, while using another 5 encoding principle , such as single channel encoding for another audio signal 16 to obtain a single channel element 22, and so on. The parametric side information to obtain an audio output signal upmix from a downmix audio signal as defined by one or more frame elements 22 is collected to form another frame element within frame 20. In this case, the frame element structure transmitting this side information relates to, or forms an extension data type for, other structure elements.
Naturally, such extensions are not restricted to multichannel or multi-object side information.
One possibility is to indicate within each frame element 22 of what type the respective frame element is. Advantageously, this procedure allows to deal with future extensions of the syntax of the continuous data flow.
Decoders that are not capable of handling certain types of frame elements would simply skip the respective frame elements within the continuous data stream by exploiting the respective length information inside those frame elements. Furthermore, it is possible to allow standard conformity decoders of different types: some are able to understand a first set of types, while others understand and can link with another set of types; alternative element types would simply be discarded by the respective decoders. Additionally, the encoder would be able to classify the frame elements at its discretion, such that decoders that are able to process such additional frame elements could be fed 5 with the frame elements within frames 20 in an order that, for example , minimizes buffer requirements within the decoder. Disadvantageously, however, the continuous data stream would have to transmit the information of the type of structure element by structure element, the need for which, in turn, negatively affects the compression rate of the data flow 12 on the one hand and the complexity of decoding on the other hand, since the additional analysis to inspect the information of the type of the respective structure element occurs within each structure element.
Furthermore, in order to allow the design elements to be omitted, the continuous data stream 12 has to transmit the length information mentioned above in relation to the design elements that will potentially be omitted. This transmission, in turn, reduces the compression efficiency.
Naturally, it would be possible to adjust the order of the frame elements 22 in another way, as per convention, but this procedure prevents decoders from being free to reorganize the frame elements due to, for example, specific properties of frame elements of extension future, requiring or suggesting, for example, a different order among the structural elements.
In addition, it will be favorable if the transmission of length information could be carried out more effectively.
Therefore, there is a need for another concept of a continuous data stream, encoder and decoder, 5 respectively.
Therefore, it is an object of the present invention to provide a continuous flow of data, an encoder and a decoder that solve the aforementioned problem and allow obtaining a more efficient way of transmitting length information.
This object is achieved by the object of the pending independent claims.
The present invention is based on the finding that structural elements that should become available for omission can be transmitted more efficiently if standard payload length information is transmitted separately within a configuration block, with the information of length within structure elements, in turn, being subdivided into a standard payload length signal followed, if the standard payload length signal is not defined, by a payload length value explicitly encoding the length of payload of the respective structural element. However, if standard payload length signaling is defined, a payload length transmission can be avoided. Instead, any frame element, the default extension payload length signal that is defined, has the default payload length and any frame element,
the standard payload length signaling that is not defined, has a payload length corresponding to the payload length value. By this measure, transmission efficiency is increased.
5 According to an application of the present application, the syntax of the continuous data flow is designed to take advantage of the discovery that a better compromise between a very high continuous data flow and the superior decoding on the one hand and flexibility of the positioning of the element of structure on the other hand can be obtained if each sequence of structures of the data stream contains a sequence of elements of structure Ne, on the other hand, the data stream contains a configuration block comprising a field indicating the number of elements N and a portion of the type indication syntax indicating, for each element position of the sequence of element positions N, an element type output of a plurality of element types with, in the sequences of structure elements N of the structures, each structure element being of the type of element indicated, by the type indication portion, for the respective position of the element in which the respective he The structure element is positioned within the sequence of the structure elements N of the respective structure in the continuous data stream.
Therefore, the structures are equally structured such that each structure contains the same sequence of structure elements N of the type of structure element indicated by the type indication syntax portion, positioned within the continuous data stream in the same sequential order. This sequential order is normally adjustable for the sequence of structures using the type indication syntax portion which indicates, for each element position of the element position sequence N, an element type of a plurality of element types.
By this measure, the types of structure element can be arranged in any order, such as the encoder criterion, to choose the order that is most appropriate for the types of structure element used, for example.
The plurality of types of structure element may, for example, include an extension element type with simply structure elements of the type of extension element containing the length information in the length of the respective structure element such that decoders that do not support the specific extension element type, can omit these structural elements from the extension element type using length information as an omission gap length. On the other hand, decoders capable of handling these structural elements of the type of extension element adequately process the content or part of its payload. Structure elements of other types of elements may not contain such length information. If, according to the specific application just mentioned, the encoder is able to freely position these structure elements of the type of extension element within the sequence of structure elements of the structures, the upper buffer in the decoders can be minimized by choosing the order of the type of the structure element appropriately and signaling it within the syntax portion of the type indication.
Advantageous implementations of the applications of the present invention are the subject of the dependent claims.
In addition, preferred applications of the present invention are described below in relation to the figures, among which 5: Fig. 1 shows a schematic block diagram of an encoder and its input and output according to an application; Fig. 2 shows a schematic block diagram of a decoder and its input and output according to an application; Fig. 3 schematically shows a continuous flow of data according to an application; Figs. 4a to z and za to zc show pseudocode tables, illustrating a concrete syntax of the continuous data flow according to an application; and Figs. 5 a and b show a block diagram of a USAC encoder and decoder; and Fig. 6 shows a typical encoder and decoder pair.
Fig. 1 shows an encoder 24 according to an application. Encoder 24 is used to encode audio content 10 into a continuous data stream 12.
As described in the introductory part of the specification of the present application, the audio content 10 can be a conglomerate of several audio signals 16. The audio signals 16 represent, for example, individual audio channels of a spatial audio scenario. Alternatively, the audio signals 16 form audio objects from a set of audio objects together defining an audio scenario for free mixing on the decoder side. The audio signals 16 are defined on a common time basis t as illustrated in 26. That is, the audio signals 16 5 can relate to the same time interval and can therefore be aligned in time with respect to each other.
Encoder 24 is configured to encode consecutive time periods 18 of audio content 10 into a sequence of frames 20 such that each frame 20 represents a respective time period 18 of audio content 10. Encoder 24 is configured to somehow, encoding each time period in the same way such that each structure 20 contains a sequence of a number of N elements of the structure elements. Within each structure 20, it is true that each element of structure 22 is one of a plurality of element types. In particular, the frame sequence 20 is an N-sequence composition of the frame elements 22 with each frame element 22 being one of a plurality of element types such that each frame 20 contains a frame element 22 of each of the N sequences of structure elements 22, respectively, and for each sequence of structure elements 22, the structure elements 22 are of the same type of elements relative to each other. In the applications described below, the elements of structure N within each structure 20 are arranged within the continuous data stream 12 such that elements of structure 22 positioned in a certain element position are of the same or equal element type and form one of the N sequences of structure elements, sometimes called subflows in the next. That is, the first frame elements 22 in frames 20 are of the same type of element and form a primary sequence (or subflow) of frame elements, the second frame elements 22 of 5 all frames 20 are of one type of element equal to each other and form a second sequence of structural elements, and so on. However, it is emphasized that this aspect of the following applications is purely optional and all applications subsequently described can be modified in relation to this: for example, instead of maintaining order among the structure elements of the N subflows within each structure 20 constant with the transfer of information related to the element types of the subflows within the configuration block, all applications explained subsequently can be revised such that a respective element type of the structure elements is contained within the structure element's own syntax, such that the order between subflows within each structure 20 can change between different structures. Naturally, such a modification would have an advantage over transmission efficiency as explained below. Even alternatively, the order could be fixed, but predefined in some way by convention, such that no indication within the configuration block would be necessary.
As will be described in more detail below, the subflows carried by the frame sequence 20 transmit information that allows a decoder to reconstruct the audio content. While some of the subflows may be indispensable, others are optional in some way and can be omitted by some of the decoders. For example, some of the subflows may represent lateral information in relation to other subflows and may, for example, be expendable. This will be explained in more detail below. However, to allow decoders to omit some of the structure elements or, to be more precise, the structure elements of at least one of the sequences of the structure elements, that is, subflows, the encoder 24 is configured to write a configuration block 28 in the data stream 12, containing standard payload length information at a standard payload length. In addition, the encoder writes for each structure element 22 of this at least one subflow to length information in the data stream 12, containing at least a subset of structure elements 22 of this at least one subflow, a length of the signaling of standard payload followed, if the standard payload length signaling is not defined, by a payload length value.
Any frame element of at least one of the frame element strings 22, the standard extension payload length signaling that is defined, has the standard payload length, and any frame element of at least one of the sequences of frame elements 22, the payload length signaling of standard extension 64 that is not defined, has a payload length corresponding to the payload length value. By this measure, an explicit transmission of the payload length for each structure element of an omitted subflow must be avoided. Instead, depending on the type of payload
22/103 transmitted by such structural elements, the payload length statistics can be such that the transmission efficiency is greatly increased by reference to the standard payload length, instead of explicitly transmitting the payload length for each element of structure repeatedly.
Therefore, after having generally described the continuous flow of data, in the following, it is described in more detail in relation to more specific applications. As mentioned earlier, in these applications the constant, but adjustable order between the subflows within the consecutive structures 20 is merely an optional feature and can be changed in these applications.
According to an application, for example, the encoder 24 is configured such that the plurality of the type of elements contains the following: a) structural elements of a single channel element type, for example, can be generated by the encoder 24 to represent a single audio signal. Therefore, the sequence of structure elements 22 at a certain element position within structures 20, for example, element structures ich with O> i> N + 1, which in turn form the ith subflow of elements of structure, together would represent the consecutive time period 18 of such a single audio signal. The represented audio signal could therefore correspond directly to any of the audio signals 16 of the audio content 10. Alternatively, however, and as will be described in more detail below, such represented audio signal can be from a downmix signal channel, what,
together with the payload data of structural elements of another type of structural elements, positioned in another element position within the structures 20, produces a number of audio signals 16 of the audio content 10 which is greater than the number of
5 channels of the recently mentioned downmix signal.
In the case of the application described in more detail below, the structural elements of such a single channel element type are denoted
OsacSingleChannelElement.
In the case of MPEG Surround and SAOC, for example, there is only a single downmix signal that can be mono,
1The stereo, or even multi channel in the case of MPEG Surround.
In the latter case, for example, downmix 5.1, consists of two channel pair elements and a single channel element.
In this case, the single channel element, as well as the two channel pair elements, are only part of the downmix signal.
In the case of a stereo downmix box, a channel pair element will be used b) Structure elements of a type of channel pair element can be generated by encoder 24 to represent a stereo pair of audio signals.
That is, frame elements 22 of that type, which are positioned in a common element position within frames 20, would together form a respective subflow of frame elements that represent a consecutive period of time 18 of such a stereo audio pair.
The stereo pair of audio signals thus represented could be directly any pair of audio signals 16 of the audio content 10, or it could represent, for example, a downmix signal,
that together with the payload data of the structural elements of another type of element that are positioned in another element position, result in a number of audio signals 16 of the audio content 10 that is greater than 2. In the application described in more details below, structure elements of such a channel pair element are denoted as OsacChannelPairElement.
c) To transmit information about audio signals 16 of audio content 10 that need less bandwidth, such as subwoofer channels or similar, encoder 24 can support frame elements of a specific type with frame elements of that type , which are positioned in a common element position, representing, for example, a consecutive time period 18 of a single audio signal. This audio signal can be any of the audio signals 16 of the audio content 10 directly, or it can be part of a downmix signal as described above with respect to the type of single channel element and type of channel pair element. In the application described in more detail below, structure elements of such a specific structure element type are denoted OsacLfeElement.
d) Structure elements of an extension element type can be generated by encoder 24 to transmit side information along the continuous data stream to allow the decoder to upmix any of the audio signals represented by structure elements of any type a, b and / or c to obtain a greater number of audio signals.
Structure elements of such a type of extension element, which are positioned in a certain common element position within the structures 20, would adequately convey lateral information regarding the consecutive time period 18 that allows the upmix of the respective time period of one or more audio signals represented by any of the other structural elements to obtain the respective time period of a higher number of audio signals, where the latter can correspond to original audio signals 16 of the audio content 10. Examples for such side information they can, for example, be parametric side information such as, for example, MPS or SAOC side information.
According to the application described in detail below, the type of elements available consists merely of the four types of elements described above, but other types of elements may be available as well. On the other hand, only one or two of the element types from a to c may be available.
As was clear from the discussion above, the omission of frame elements 22 of the extension element type from the continuous data stream 12 or the neglect of these frame elements in decoding does not make the reconstruction of audio content 10 completely impossible: at least, the remaining structural elements of other types of elements would transmit enough information to result in audio signals. These audio signals do not necessarily correspond to the original audio signals of the audio content 10 or a subset of it, but can represent a type of "amalgamation of audio content 10. That is, structural elements of the type of extension element can transmit information (payload data) representing lateral information in relation to one or more structural elements positioned in different positions of different elements within the structures
20.
In an application described below, however,
structural elements of the type of extension element are not restricted to this type of transmission of lateral information. Instead, structure elements of the extension element type are, in the following, denoted UsacExtElement and are defined to transmit payload data along with length information where the latest length information allows decoders to receive the data stream 12 , to omit these structural elements of the type of extension element in case, for example, the decoder is unable to process the respective payload data within these structural elements. This is described in more detail below.
Before proceeding with the description of the encoder in Fig. 1, however, it should be noted that there are several possibilities for alternatives for the types of element described above. This is especially true for the type of extension element described above. In particular, in case the type of extension element is configured such that the payload data of it is omissible by the decoders that are, for example, not able to process the respective payload data, the payload data of these elements of structure of the type of extension element could be any types of payload data.
These payload data could form lateral information in relation to the payload data of other structural elements of other types of structural element, or they could form independent payload data representing another audio signal, for example. Furthermore, even if the payload data of the structural elements of the type of extension element representing side information of payload data of
27/103 structural elements of other types of structural element, the payload data of these structural elements of the type of extension element are not restricted to the type described above, namely multichannel or multi-object side information.
5 Payload multichannel side information accompanies, for example, a downmix signal represented by any of the structural elements of another type of element, with spatial indicators such as binaural cue coding parameters such as intercoherence values -channel (ICC I inter channel coherence), inter-channel level differences (ICLD I inter channel level differences), and / or inter-channel time differences (ICTD inter channel time differences) and, optionally, channel loss coefficients , whose parameters are known in the art, for example, the MPEG Surround standard. The spatial indicator parameters mentioned above can, for example, be transmitted within the payload data of the structure elements of the type of extension element in a time / frequency resolution, that is, a parameter per time / frequency window of the grid. of time / frequency. In the case of multi-object side information, the payload data of the type of extension element of the structure element may contain similar information such as inter-object cross-correlation parameters (IOC inter-object cross-correlation), level differences. object (OLD object level differences) as well as downmix parameters revealing how original audio signals downmixed a channel (s) of a downmix signal represented by any of the structural elements of another type of element. Recent parameters are, for example, known in the art of the SAOC standard. However, an example of different side information that the payload data of the structure elements of the type of extension element could represent is, for example, SBR data to parametrically encode a 5-part high frequency envelope of an audio signal. represented by any of the structural elements of the other types of structural element, positioned in different element positions within the structures 20 and enabling, for example, spectral band replication by using the low frequency part as IO obtained since the last signal of audio as the basis for the high frequency part with them forming the high frequency part shell thus obtained by the SBR data shell. More generally, the payload data of the structure elements of the type of extension element could transmit lateral information to modify audio signals represented by structure elements of any of the other types of elements, positioned in different element positions within the structure 20, either in the time domain or in the frequency domain where the frequency domain can, for example, be a QMF domain or some other filter bank domain or transformation domain.
Proceeding with the description of the functionality of the encoder 24 of Fig. 1, it is configured to encode in the continuous data stream 12 the configuration block 28 containing a field indicating the number of elements Ne a portion of the type indication syntax indicating , for each element position in the sequence of element positions N, the respective element type. Therefore, encoder 24 is configured to encode,
for each structure 20, the sequence of structure elements N 22 in the continuous data stream 12 such that each structure element 22 of the sequence of structure elements N 22, which is positioned in a respective element position within the sequence of 5 elements structure N 22 in the data stream 12, is of the type of element indicated by the type indication part for the respective element position. In other words, encoder 24 forms subflows N, each of which is a sequence of frame elements 22 of a respective type of element. That is, for all these subflows N, structure elements 22 are of the same element types, while structure elements of different subflows can be of a different type of element. The encoder 24 is configured to multiplex all these structure elements in the continuous data stream 12 by concatenating all the structure elements N of these subflows with respect to a common time period 18 to form a structure 20. Therefore, in the continuous data flow 12 these structure elements 22 are arranged in structures 20. Within each structure 20, the representative of subflows N, that is, the structure elements Neither in relation to the same time period 18, are arranged in the static sequential order defined by the sequence of element positions and the type indication syntax portion in configuration block 28, respectively.
By using the type indication syntax portion, encoder 24 is able to freely select the order, using which elements of structure 22 of subflows N are arranged within structures 20. By this measure, encoder 24 is capable of maintaining, for example, the upper buffer on the decoder side as low as possible.
For example, a subflow of structural elements of the type of extension element that transmits lateral information to structural elements of another subflow (base subflow), which are of a non-type of
5 extension element, can be positioned in an element position within the structures 20 immediately following the element position in which these base subflow structure elements are located in the structures 20. By this measure, the buffer time during which the The decoding side has to store results, or intermediate results, of the decoding of the base subflow for an application of the side information of this, it is kept low, and the upper buffer can be reduced.
In the case of side information of the payload data of the structure elements of a subflow, which are of the type of extension element, being applied to an intermediate result, such as a frequency domain of the audio signal represented by another subflow of elements of structure 22
(base subflow), the positioning of the subflow of structure elements of the type of extension element 22 such that it immediately follows the base subflow, not only minimizes the upper buffer, but also the length of time the decoder can have to interrupt further processing of the reconstruction of the represented audio signal because, for example,
the payload data of the structure elements of the type of extension element is to modify the reconstruction of the audio signal relative to the representation of the base subflow.
However, it may also be beneficial to place a dependent extension subflow before the base subflow representing an audio signal, to which the extension subflow refers. For example, encoder 24 is free to position the extension payload subflow within the upstream data stream relative to a channel element type subflow. For example, extension payload 5 of subflow i could transmit data from the DRC dynamic range control and is transmitted before, or at a previous element position ai, relative to the encoding of the corresponding audio signal, such as through frequency domain coding (FD frequency domain), within the channel subflow at the element position i + l, for example. Then, the decoder is able to use DRC immediately when decoding and reconstructing the audio signal represented by a non-extension subflow i + l.
encoder 24 as described so far represents a possible application of the present application.
However, Fig. 1 also shows a possible internal structure of the encoder that should be understood merely as an illustration. As shown in Fig. 1, encoder 24 may contain a distributor 30 and a sequencer 32 between which several encoding modules 34a-e are connected in the manner described in more detail in the following. In particular, the distributor 30 is configured to receive the audio signals 16 from the audio content 10 and to distribute the same to the individual coding modules 34a-e. The way that the distributor 30 distributes the consecutive time period 18 of the audio signal 16 on the coding modules 34a to 34e is static. In particular, the distribution can be such that each audio signal 16 is routed to one of the coding modules 34a to 34e exclusively. An audio signal fed to the LFE encoder 34a is encoded by the LFE encoder 34a in a subflow of type c frame elements 22 (see above), for example. Audio signals fed to a single channel encoder input 34b are encoded by the last 5 in a subflow of frame elements 22 of type a (see above), for example. Similarly, a pair of audio signals fed to an encoder input of channel pair 34c is encoded by the latter in a subflow of structure elements 22 of type d (see above), for example. The aforementioned coding modules 34a to 34c are connected with an input and output thereof between the distributor 30 on the one hand and the sequencer 32 on the other hand.
However, as shown in Fig. 1, the inputs of the encoder modules 34b and 34c are not only connected to the output interface of the distributor 30. Instead, it can be fed by an output signal from any of the 34d encoding modules. and 34e. Recent encoding modules 34d and 34e are examples of encoding modules that are configured to encode a number of incoming audio signals into a downmix signal from a smaller number of downmix channels on the one hand, and a subflow of frame elements 22 type d (see above), on the other hand. As was clear from the above discussion, the 34d encoding module can be a SAOC encoder, and 34e encoding module can be an MPS encoder. Downmix signals are routed to any of the 34b and 34c coding modules. The subflows generated by the coding modules 34a to 34e are routed to the sequencer 32 which sequences the subflows in the data stream 12 as described above.
Therefore, the encoding modules 34d and 34e have their input for the number of audio signals connected to the output interface of the distributor 30, while their subflow output is connected to a sequencer input interface 32, and their downmix output is 5 connected to the inputs of the coding modules 34b and / or 34c, respectively.
It should be noted that according to the description above, the existence of multi-object encoders 34d and multi-channel encoders 34e was merely selected for illustrative purposes, and any of these encoding modules 34d and 34e can be abandoned or replaced with another encoding module , for example.
After having described the encoder 24 and its possible internal structure, a corresponding decoder is described with reference to Fig. 2. The decoder of Fig. 2 is generally indicated with a reference signal 36 and has an input to receive the continuous stream of data 12 and an output to release a reconstructed version 38 of the audio content 10 or an amalgamation of it. Therefore, the decoder 36 is configured to decode the continuous data stream 12 containing the configuration block 28 and the frame sequence 20 shown in Fig. 1, and to decode each frame 20 by decoding the frame elements 22 according to the type. of element indicated, by the type indication part, for the respective element position in which the respective structure element 22 is positioned within the sequence of structure elements N 22 of the respective structure 20 in the continuous data stream 12. That is, the decoder 36 is configured to assign each frame element 22 to one of the possible element types depending on their element position within the current frame 20 instead of any information within the frame element itself. By this measure, the decoder 36 obtains subflows N, the first subflow 5 constituted by the first structure elements 22 of the structures 20, the second subflow constituted by the second structure elements 22 within the structures 20, the third subflow constituted by the third structure elements 22 within structures 20 and so on.
Before describing the functionality of the decoder 36 with respect to the structure elements of the extension element type in more detail, a possible internal structure of the decoder 36 of Fig. 2 is explained in more detail to correspond to the internal structure of the encoder 24 of Fig 1. As described with respect to encoder 24, the internal structure is to be understood as merely illustrative.
In particular, as shown in Fig. 2, the decoder 36 can internally contain a distributor 40 and an arranger 42 between which the decoding modules 44a to 44e are connected. Each decoding module 44a to 44e is responsible for decoding a subflow of frame elements 22 of a certain type of frame element. Therefore, the distributor 40 is configured to distribute the subflows N of the continuous data stream 12 on the decoding modules 44a to 44e correspondingly. Decoding module 44a, for example, is an LFE decoder that decodes a subflow of type c frame elements 22 (see above) to obtain a narrow range
(for example) audio signal at its output. Similarly, single channel decoder 44b decodes an input subflow of frame elements of type a (see above) to obtain a single audio signal at its output, and channel pair decoder 5c decodes an input subflow of the structure elements 22 of type b (see above) to obtain a pair of audio signals at their output. The decoding modules 44a to 44c have their input and output connected between the output interface of the distributor 40 on the one hand and the input interface of the arranger 42 on the other hand.
Decoder 36 may merely have decoding modules 44a through 44c. The other decoding modules 44e and 44d are responsible for structural elements of the type of extension element and are therefore optional in relation to the concern with compliance with the audio codec. If both or any of these extension modules 44e and 44d are missing, distributor 40 is configured to omit the respective subflows of the design element in the data stream 12 as described in more detail below, and the reconstructed version 38 of the audio content 10 is merely an amalgamation of the original version having the audio signals 16.
If present, however, that is, if the decoder 36 supports SAOC and / or MPS extension structure elements, the multichannel decoder 44e can be configured to decode the subflows generated by the encoder 34e, while the multi-object decoder 44d is responsible for decoding subflows generated by the 34d multi-object encoder.
Therefore, in case the decoder module 44e and / or 44d is present, a switch 46 can connect the output of any of the decoding modules 44c and 44b with an input downmix signal from the decoding module 44e and / or 44d. The multichannel decoder 44e can be configured to upmix the input signal downmix using side information within the input subflow from distributor 40 to obtain an increased number of audio signals at its output. The multi-object decoder 44d can act according to the difference that the multi-object decoder 44d treats individual audio signals as audio objects while the multi-channel decoder 44e treats audio signals on its output as audio channels.
The audio signals thus reconstructed are routed to the arranger 42 that disposes them to form the reconstruction 38. The arranger 42 can be additionally controlled by the user input 48, which the user input indicates, for example, a speaker configuration available or larger number of 38 reconstruction channels allowed. Depending on user input 48, arrangement 42 may disable any of the decoding modules 44a to 44e such as, for example, any of the extension modules 44d and 44e, even if present and even if the elements of the extension structure are present in the continuous data stream 12.
Generally speaking, the decoder 36 can be configured to analyze the continuous data stream 12 and reconstruct the audio content based on a subset of the sequences of structural elements, that is, subflows, and for, with respect to, least one of the strings of structure elements 22 does not belong to the subset of the strings of structure
37/103 frame elements, read configuration block 28 from at least one of the frame element sequences 22, including standard payload length information in a payload length, and for each frame element 22 from at least one of the sequences of structure elements 22, reading length information from the continuous data stream 12, reading the length information containing at least a subset of structure elements 22 of at least one of the sequences of structure elements 22, reading a standard payload signal length followed, if the standard payload signal length is not defined, by reading a payload length value. The decoder 36 can then omit, in the analysis of the continuous data stream 12, any structure element of at least one of the sequences of structure elements, the standard extension payload length signaling that is defined, using the length of the default payload as the default interval length, and any frame element of at least one of the frame element strings 22, the default extension payload length signaling that is not defined, using a payload length corresponding to the payload length value of an omission interval length.
In the applications described below, this mechanism is strict to subflows of the type of extension element only, but naturally such a mechanism or portion of syntax could apply to more than one type of element.
Before describing other possible details of the decoder, encoder and data stream, respectively, it should be noted that due to the encoder's ability to merge subflux structure elements that are of the type of extension element, between elements of 5 structure structures. subflows, other than the extension element type, the upper buffer of decoder 36 can be reduced by encoder 24 by properly choosing the order between the subflows and the order between the structure elements of the subflows within each structure 20, respectively. Imagine, for example, that the channel pair decoder entering subflow 44c would be placed in the first element position within structure 20, while the multi channel subflow for decoder 44e would be placed at the end of each structure. In that case, the decoder 36 would have to store the intermediate audio signal I5 representing the downmix signal for the multichannel decoder 44e for a period of time involving the time between the arrival of the first frame element and the last frame element of each frame 20 , respectively. Only then is the 44e multichannel decoder able to begin processing.
This postponement can be avoided by the encoder 24 by arranging the dedicated subflow for the multi channel decoder 4e in the second position of structures element 20, for example. On the other hand, the distributor 40 does not need to inspect each structural element for its association with any of the subflows.
Instead, the distributor 40 is able to deduce the association of a current structure element 22 of a current structure 20 with any of the subflows N merely from the configuration block and the type indication syntax portion contained herein.
A reference is now made to Fig. 3 showing the data stream 12 containing, as already described above, a configuration block 28 and a sequence of structures 20. The data stream parts on the right follow other positions 5 of the part of streaming data to the left when seen in Fig. 3. In the case of Fig. 3, for example, configuration block 28 precedes the structures 20 shown in Fig, 3 where, for illustrative purposes only, only three structures 20 are completely shown in Fig. 3.
In addition, it should be noted that configuration block 28 can be inserted into the data stream 12 between frames 20 on a periodic or intermittent basis to allow random access points in data streaming applications. Generally speaking, configuration block 28 can be a simply connected part of the data stream 12.
The configuration block 28 contains, as described above, a field 50 indicating the number of elements N, that is, the number of elements of structure N within each structure 20 and the number of multiplexed subflows in the data stream 12 as described above. In the following application describing an application for a concrete data flow syntax 12, field 50 is denoted numElements and configuration block 28 called OsacConfig in the following example of specific syntax in Fig. 4a-z and za-zc. In addition, configuration block 28 contains the type indication syntax portion 52. As already described above, this part 52 indicates for each element position an element type of a plurality of element types. As shown in Fig. 3 and as is the case with respect to the following specific syntax example, the type 52 syntax portion of the syntax may contain the sequence of N 54 syntax elements with each syntax element 54 indicating the type of element for the respective element position 5 in which the respective syntax element 54 is positioned within the type indication syntax portion 52.
In other words, the syntax element in 54 within part 52 can indicate the type of element in the subflow in and structure element in each structure 20, respectively. In the subsequent concrete syntax example, the syntax structure element is denoted UsacElementType. Although the syntax portion of type indication 52 may be contained within the data stream 12 as a contiguous or simply connected part of the data stream 12, it is shown exemplarily in Fig. 3 that the elements 54 thereof are integrated with other syntax element parts of configuration block 28 that are present for each of the N element positions individually. In the applications described below, these integrated syntax parts belong to the sub-flow-specific configuration data 55 meaning which is described in more detail below.
As already described above, each structure 20 is composed of a sequence of structure elements N 22. The element types of these structure elements 22 are not signaled by the respective type indicators within the structure elements 22 themselves. the element types of frame elements 22 are defined by their element position within each frame 20. Frame element 22 occurring first in frame 20, denoted frame element
22a in Fig. 3, has the first element position and is suitable for the type of element that is indicated for the first element position by the syntax portion 52 within the configuration block
28. The same applies to the following elements of frame 22. For example, frame element 22b occurring immediately after the first frame element 22a within the continuous data stream 12, that is, the one having the element position 2, is of the type of element indicated by the syntax portion
52.
According to a specific application, the syntax elements 54 are arranged within the continuous data stream 12 in the same order as the structure elements 22 to which they refer. That is, the first syntax element 54, that is, the one occurring first in the continuous data stream 12 and being positioned on the outermost left side in Fig. 3, indicates the type of element of the first structure element occurring 22a of each structure 20, the second syntax element 54 indicates the element type of the second frame element 22b and so on. Naturally, the sequential order or arrangement of the syntax elements 54 within the continuous data stream 12 and parts of the syntax 52 can be changed from the sequential order of the structure elements 22 within the structures 20.
Other permutations would also be possible, although less preferred.
For decoder 36, this means that it can be configured to read this sequence and N 54 syntax elements from the type 52 syntax portion.
To be more precise, decoder 36 reads field 50 such that decoder 36 knows the number of syntax elements N 54 to be read from the data stream 12. As mentioned, decoder 36 can be configured to associate the syntax elements and the type of element indicated, therefore, with the 5 structure elements 22 within the structures 20 such that the syntax element ith 54 is associated with the structure element i th 22.
In addition to the above description, configuration block 28 may contain sequence 55 of configuration elements N 56 with each configuration element 56 containing configuration information for the element type for the respective element position in which the respective configuration element 56 is positioned in sequence 55 of configuration elements N 56. In particular, the order in which sequence of configuration elements 56 is written in data stream 12 (and read from data stream 12 by decoder 36) can be the same order as used for structure elements 22 and / or syntax elements 54, respectively. That is, the configuration element 56 occurring first in the data stream 12 can contain the configuration information for the first frame element 22a, the second configuration element 56, the configuration information for the frame element 22b and so on. against. As already mentioned above, the type 52 syntax portion of the indication and the element-position specific configuration data 55 is shown in the application of Fig. 3 as being interleaved with each other such that the configuration element 56 belonging to the position of element i is positioned in the continuous data stream 12 between the type 54 indicator for the element position ie element position i + l. Even in other words, the configuration elements 56 and the syntax elements 54 are arranged in the data stream alternately and read from it 5 alternately by the decoder 36, but another positioning if these data in the data stream 12 within the block 28 would also be possible, as mentioned earlier.
By transmitting a configuration element 56 to each element position l ... N in configuration block 28, respectively, the continuous flow of data allows different configuration of structure elements belonging to the different subflows and element positions, respectively, but being of the same element type. For example, a continuous stream of data 12 can contain two unique channel subflows and therefore two single channel element structure elements within each structure 20. The configuration information for both subflows can, however, be adjusted differently. in the data stream 12. This, in turn, means that the encoder 24 of Fig. 1 is able to define differently encoding parameters within the configuration information for these different subflows and the single channel decoder 44b of decoder 36 is controlled using these different encoding parameters when decoding these two subflows. this is also true for the other decoding modules. More generally speaking, decoder 36 is configured to read the sequence of configuration elements N 56 from configuration block 28 and decode structure element iº 22 according to the type of element indicated by the syntax element ith 54, and using the configuration information contained by the configuration element ith 56.
For illustrative purposes, it is assumed in Fig. 3 that the second subflow, that is, the composite subflow of the structure elements 5b occurring in the second element position within each structure 20, has a type of composite extension subflow element of the structural elements 22b of the extension element type. Of course, this is purely illustrative.
In addition, it is only for illustrative purposes that the data stream or configuration block 28 contains a configuration element 56 per element position not corresponding to the type of element indicated by that element position per portion of syntax 52. According to an alternative application, for example, there may be one or more types of element for which no configuration element is contained by configuration block 28 such that, in the latter case, the number of configuration elements 56 within configuration block 28 can be less than N depending on the number of structure elements of such element types occurring in the syntax portion 52 and structures 20, respectively.
In any case, Figure 3 shows an additional example for the construction of configuration elements 56 concerning the type of extension element. In the application of specific syntax subsequently explained, these configuration elements 56 are denoted as UsacExtElementConfig. Only for the conclusion, it is noted that in the application of specific syntax subsequently explained, the configuration elements for other types of elements are denoted as üsacSingleChannelElementConfig, üsacChannelPairElementConfig and üsacLfeElementConfig.
However, before describing a possible structure for configuration element 56 for the type of extension element, a reference is made to the part of Figure 3 showing a possible structure of a structure element of the type of extension element, the second structural member 22b is shown here. As shown here, the structural elements of the extension element type can comprise length information 58 in a length of the respective structural element 22b. The decoder 36 is configured to read each element of structure 22b, the type of extension element of each structure 20, of this information of length 58. If decoder 36 is not capable of, or is instructed by the user, not to process the continuous data stream to which the frame element of the extension element type belongs, decoder 36 ignores frame element 22b using length information 58 as the length of the ignored interval, that is, the length of the flow part of data that will be ignored.
In other words, the decoder 36 can use the length 58 information to compute the number of bytes or any other suitable measure for defining the length of the data stream interval, which will have to be ignored until accessing or visiting the next element of structure within the current structure 20 or the beginning of the next next structure 20, in order to proceed with the next reading of the data stream 12.
As will be described in more detail below, the structural elements of the extension element type can be configured to accommodate future and alternative extensions, or the development of the audio codec and,
consequently, the structural elements of the element type of the
5 extensions can have distributions of different statistical lengths.
In order to take advantage of the possibility that, according to some applications, the structural elements of the type of extension element of certain subflows are of constant length or have a rather narrow statistical length distribution, according to some applications of the present application , the configuration elements 56 for the extension element type can comprise the standard payload length information 60 as shown in Figure 3. In that case, it is possible for the structural elements 22b of the extension element type of the respective continuous flow data, refer to this payload length information 60 contained within the respective configuration element 56 for the respective bit rate, rather than explicitly transmitting the payload length.
Specifically, as shown in Figure 3, in that case, the length information 58 may comprise the conditional syntax portion 62 in the form of standard payload length signaling 64 followed, if the standard payload length signaling 64 is not is defined by an extension payload length value 66. Any frame element 22b of the extension element type has a standard extension payload length as indicated by information 60 in the corresponding configuration element 56 in the event that a standard extension payload length signaling 64 of the length information 62 of the respective frame element 22b of the type of extension element to be defined, and has an extension payload length corresponding to the value of the extension payload length 66 information
5 length 58 of the respective frame element 22b of the extension element type in case the standard extension payload length signal 64 of the length information 58 of the respective frame 22b, of the extension element type is not defined.
That is, the explicit encoding of the extension payload length value 66 can be avoided by encoder 24 whenever possible, to merely refer to the standard extension payload length as indicated by the standard payload length information 60 within the configuration element 56 of the corresponding data stream and a position of the element, respectively.
Decoder 36 acts as follows.
It reads the standard payload length information 60 when reading the configuration elements 56. When reading the structural elements 22b of the corresponding continuous data stream, the decoder 36, when reading the length information of these structural elements,
reads standard extension payload length signaling
64 and checks whether it is defined or not.
If the standard payload length signaling 64 is not set, the decoder continues reading the extension payload length value 66 from the conditional syntax portion 62 of the data stream in order to obtain the payload length extension of the respective structural element.
Meantime,
if the signaling of the standard payload 64 is set, the decoder 36 defines the length of the extension payload of the respective structure that will be equal to the length of the standard extension payload as derived from information 60. Ignoring decoder 36 may then involve ignoring a payload section 5 of the current frame element using the length of the newly determined payload as the ignored gap length, that is, the length of a portion of the data stream 12 that will be ignored in order to access the next structure element 22 of the current structure 20 or the beginning of the next structure 20.
Therefore, as previously described, the repeated transmission of the payload length structure type of the structure elements of a type of a certain subflow extension element can be avoided using a signaling mechanism 64 whenever the load length variety of these structural elements is quite low.
However, since a priori it is not clear whether the payload carried by the elements of the structure of a type of extension element of a certain subflow has such statistics concerning the length of the payload of the elements of the structure, and consequently, whether it is worthwhile transmit the standard payload length explicitly in the configuration element of such a subflow of the elements of the extension element type structure, in accordance with the additional application, the standard payload length information 60 is also implemented by a portion of syntax conditional comprising a 60a signaling called UsacExtElementDefaultLengthPresent in the following specific syntax example, and indicating whether an explicit transmission of the standard payload length happens or not. Merely if defined, the conditional syntax portion comprises the explicit transmission 60b of the standard payload length called UsacExtElernentDefaultLength in the following 5 specific syntax example. In contrast, the standard payload length is defined by default as O. In the latter case, bit consumption in the continuous data stream that is saved as an explicit transmission of the standard payload length is avoided. That is, the decoder 36 (and the distributor 40, which is responsible for all the reading procedures described above and henceforth), can be configured to, when reading the information of the standard payload length 60, read a signal present in the length standard payload length 60a of data stream 12, check whether the signaling present in the standard payload length 60a is defined or not, and whether the signaling present in the standard payload length 60a is defined, define the payload length standard extension as zero, and if the signaling present in the standard payload length 60a is not defined, explicitly read the standard extension payload length 60b of data stream 12 (namely, field 60b following signaling 60a ).
In addition, or alternatively to the standard payload mechanism, the length 58 information may comprise the signaling present in the payload extension 60 of the length 58 information of which is not defined, consists merely of the signaling present in the payload extension and that's it. That is, there is no payload section 68. On the other hand, the length information 58 of any structure element 22b of the type of extension element, the signaling present in the payload data 70 of the length information 58 of which is configured, adernais comprises a portion of syntax 62 or 66 indicating the length of the payload of extension of the respective structure 22b, that is, the length of its payload section
68. In addition to the standard payload length mechanism, that is, in combination with the standard extension payload length 64 signaling, the signaling present in the extension payload 70 allows each structure element of the type of extension with two effectively codable payload lengths, namely, O on the one hand and standard payload length, that is, the most likely payload length on the other hand.
In analyzing or reading the information of length 58 of the current structure element 22b of the type of extension element, the decoder 36 reads the signaling present in the extension payload 70 of the continuous data stream 12, verifies whether the signaling present in the payload extension 70 is defined, and if the signaling present in the extension payload 70 is not defined, for the reading of the respective structure element 22b, and continues with another reading, the next structure element 22 of the current structure 20 or begins with reading or analyzing the next structure 20. Considering if the signaling present in payload 70 is defined, decoder 36 reads the portion of syntax 62 or at least part 66 (if signaling 64 is non-existent, since this mechanism does not available) and ignores, if the payload of the current frame element 22 has to be ignored, section 68
51/103 using the extension payload length of the respective frame element 22b of the extension element type as the ignored gap length.
As described above, structure elements of the type of extension element can be provided in order to accommodate future extensions of the audio codec or alternative extensions for which the current decoder is not suitable, and consequently structure elements of the type of extension. extension element must be configurable. Specifically, in accordance with an application, the configuration block 28 comprises, for each position of the element for which the type indication part 52 indicates the type of extension element, a configuration element 56 comprising the configuration information for the type of extension element, characterized by the configuration information comprising, in addition to, or alternatively to the components highlighted above, a field of the type of extension element 72 indicating a type of payload data from a plurality of types of payload data . The plurality of types of payload data, according to an application, comprises the type of multi channel side information and a type of side information encoding multi-objects in addition to other types of data that are, for example, reserved for developments future. Depending on the type of payload data indicated, configuration element 56 additionally covers the configuration data specific to the type of payload data.
Consequently, the structure elements 22b in the corresponding position of the element and the respective sub-flow, respectively, carry in their payload sections 68 the payload data corresponding to the type of payload data indicated. In order to allow adaptation of the length of the configuration data specific to the type of payload data 74 to the type of payload data, and to allow a reservation for future developments of additional payload data types, applications The specific syntax elements described below present configuration elements 56 of the extension element type additionally comprising a configuration element length value called UsacExtElementConfigLength so that decoders 36 that are not informed in the type and payload data indicated for the current subflow , are able to bypass configuration element 56 and its payload data type specific configuration data 74 to access the immediately following part of data stream 12, such as the element type 54 syntax element of the next element position (or in the alternative application not shown, the configuration element of the next position of the ele or at the beginning of the first structure following configuration block 28 or some other data, as will be shown in Figure 4a. Specifically, in the following specific application for a syntax, the configuration data of the multi-channel side information is contained in SpatialSpecificConfig, while the configuration data of the multi-object type is contained within SaocSpecificConfig.
In accordance with the most recent aspect, decoder 36 would be configured to, when reading configuration block 28, perform the following steps for each element or subflow position for which the type 52 indication part indicates the type of element of extension:
Reading configuration element 56, including reading the extension element type 72 field indicating the payload data type of the plurality of available payload data types, 5 If the extension element type 72 field indicates the type of multichannel side information, reading the configuration data from the multi channel side information 74 as part of the configuration information from the data stream 12, and whether the extension element type field 72 indicates the type of side information of the multi-object, reading the configuration data from the side information of the multi-object 74 as part of the configuration information of the data stream 12.
Then, when decoding the corresponding structure elements 22b, that is, those of the position of the corresponding element and the subflow, respectively, the decoder 36 would configure the multichannel decoder 44e using the multichannel configuration information data of the multichannel 74, while feeding the so configured multi-object decoder 44d with payload data 68 of the respective structure elements 22b as side information of the multichannel, in the case of the type of payload data indicating the type of side information of the multichannel, and decode the structure elements 22b corresponding by configuring the multi-object decoder 44d using the configuration data from the side information of multi-object 74, and feeding the thus configured multi-object decoder 44d with the payload data 68 of the respective structure element 22b, in the case of the type of payload data indicating the type of lateral information of the multi-object.
However, if an unknown payload data type is indicated by field 72, decoder 36 would ignore the payload data type 7 4 configuration data using the aforementioned configuration length value 5 also comprised by the current configuration.
For example, decoder 36 could be configured to, for any position of the element for which the type 52 indication part indicates the type of extension element, read a field of configuration data length 76 from the stream of data. data 12 as part of the configuration information of the configuration element 56 for the respective position of the element in order to obtain a length of configuration data, and checks whether the payload data type by the extension element type field 72 of the configuration information of the configuration element for the respective position of the element, belongs to the predetermined set of payload data types, being a subset of the plurality of payload data types. If the type of payload data indicated by an extension element type 72 field of the configuration element configuration information for the respective position of the element belongs to the predetermined set of data types, decoder 36 would read the load data useful depending on the configuration data 74 as part of the configuration element configuration information for the respective position of the element from the data stream 12, and decodes the structural elements of the extension element type in the respective position of the element in the structures 20, using the payload data dependent on the configuration data 74. But, if the type of payload data indicated by the extension element type field 72 of the configuration element configuration information for the respective element position does not belong to the predetermined set of payload data types, the decoder will ignore the payload data of data dependent on configuration data 74 using the configuration data length, and ignoring the structure elements of the type of extension element in the respective position of the structure element 20, using the length information of 58 in it.
In addition to, or alternatively to, the aforementioned mechanisms, the structure elements of a certain subflow can be configured to be transmitted in fragments instead of one in each complete structure. For example, the elements of the configuration of the types of extension elements could comprise a signal for the use of fragmentation 78, the decoder can be configured to, in the reading of structure elements 22 positioned in any position of the element for which the type of part indication indicates the type of extension element, and for which the usage signal for fragmentation 78 of the configuration element is set, read an information fragment 80 from the data stream 12, and use the fragment information to put the payload data of these structural elements of the consecutive structures together. In the following example of specific syntax, each structure element of the type of extension of a subflow for which the usage signal for fragmentation 78 is defined, comprises a pair of initial signals indicating the beginning of a subflow payload and a final signal indicating the end of a subflow payload item. These flags are called usacExtElementStart and usacExtElementStop in the following specific syntax example.
5 In addition, in addition to, or alternatively to the above mechanism, the same variable length code can be used to read length information 80, the extension element type field 72, and the configuration data length field 76, thus decreasing the complexity to implement the decoder, for example, and saving the bits needing additional bits merely in rare cases of occurrence, such as the types of future extension elements, lengths of the type of larger extension elements and so on. In the specific example explained in the sequence, this VCL code is derivable from Figure 4m.
Summarizing the above, the following can be applied to the decoder functionality: (1) Reading of configuration block 28, and (2) Reading / analyzing the sequence of structures
20. Steps 1 and 2 are performed by the decoder 36 and, more precisely, the distributor 40.
(3) A reconstruction of the audio content is restricted to those subflows, that is, for those sequences of elements of structures in the positions of the elements, the decoding of which is supported by decoder 36. Step 3 is carried out within decoder 36, for example, in the decoding modules (see Figure 2).
Consequently, in step 1 the decoder 36
57/103 reads the number 50 of the subflow and the number of elements of structure 22 per structure 20, respectively, as well as the syntax portion of element 52 revealing the type of element of each of these subflows and positions of elements, respectively. For analysis 5 of the data stream in step 2, the decoder 36 then critically reads the structure elements 22 of the structure sequence from the data stream 12. In doing so, the decoder 36 ignores the structure elements , or the remaining parts / payload thereof, using length 58 information as described above. In the third step, the decoder 36 performs the reconstruction by decoding the elements of the structure without having to be ignored.
When deciding in step 2 which element and subflow positions will be ignored, decoder 36 can inspect configuration elements 56 within configuration block 28. In order to do so, decoder 36 can be configured to critically read the configuration elements 56 from the configuration block 28 of the data stream 12 in the same order as used for the element type 54 indicators, and the structure elements 22 themselves. As noted above, the cyclic reading of the configuration elements 56 can be leveled with the cyclic reading of the syntax elements 54. Specifically, the decoder 36 can inspect the extension element type 72 field within the configuration elements 56 of the subflows of the type of extension element. If the type of extension element is not a supported type, the decoder 36 ignores the respective subflow, and the frame elements 22 corresponding to the respective positions of the frame elements within the frames 20.
In order to facilitate the bit rate necessary for the transmission of the length 58 information, the decoder 36 is configured to inspect the configuration elements 56 of the 5 sub-streams of the type of extension elements, and specifically the standard payload length information. 60 contained from step
1. In the second step, the decoder 36 inspects the length information 58 of the extension frame elements 22 which will be ignored. Specifically, first, decoder 36 inspects signaling 64. If defined, encoder 36 uses the standard length indicated for the respective subflow by the standard payload length information 60, as the remaining payload length that will be ignored in order to continue with the cyclical reading / analysis of the structural elements of the structures.
If signaling 64, however, is not defined, then decoder 36 explicitly reads payload length 66 from data stream 12. Although not explicitly explained above, it should be made clear that decoder 36 can derive the number of bi ts or bytes that will be ignored in order to access the next structure element of the current structure or the next structure by some additional computation. For example, decoder 36 can take into account whether the fragmentation mechanism is activated or not, as explained above with respect to signaling 78. If enabled, decoder 36 can take into account that the elements of the subflow structure with the set signaling 78, in any case, have fragmentation information 80 and that, consequently, payload data 68 starts later than it should in case fragmentation signaling 78 is not defined.
When decoding in step 3, the decoder acts as usual: that is, the individual subflows are subjected to decoding mechanisms or decoding modules, as shown in Figure 2, characterized by some subflows being able to form the lateral information for other subflows, such as has been explained above with respect to specific examples of extension subflows.
With respect to other possible details concerning the functionality of the decoders, reference is made to the above discussion. Only for the conclusion, it is noted that the decoder 36 can ignore the additional analysis of the configuration elements 56 in step 1, namely, for those positions of elements that will be ignored, for example, the type of extension element indicated by the field 72 is not suitable for a supported set of extension element types. Then, the decoder 36 can use the configuration length information 76 in order to ignore the respective configuration elements in cyclic reading / analysis in the configuration elements 56, that is, ignoring the respective number of bits / bytes in order to access the next syntax element of the data stream such as indicator 54 of the element's next position.
Before proceeding with the application of the specific syntax mentioned above, it should be noted that the present invention is not restricted to being implemented with unified speech and audio coding and its facets, such as switching core coding, using a mixture or a switch between AAC frequency domain encoding and LP encoding using parametric encoding (ACELP) and transformation encoding (TCX). Instead, the sub-streams mentioned above represent the audio signals using any coding scheme. In addition, although in the application of specific syntax described below it is assumed that the SBR is an option for encoding the main codec used to represent the audio signals using a single channel and sub-streams of the dual channel element type, the SBR can also 10 not be a possibility of the last types of elements, but merely usable using types of extension elements.
In the sequence, an example of specific syntax for the continuous data stream 12 is explained. It should be noted that the specific syntax example represents a possible implementation for the application of Figure 3, and in accordance between the syntax elements of the following syntax and the structure of the data flow in Figure 3 is indicated or derivable from the respective observations in Figure 3, as well as the description in Figure 3. The basic aspects of the following specific example are now highlighted: In this regard, it should be noted that any additional details beyond those described above with respect to Figure 3 are not to be understood as a possible extension of the application of Figure 3. All of these extensions can be built individually within the application of Figure 3. As a final note, it should be understood that the specific syntax example described below refers explicitly to the decoder environment and the encoder of Figures 5a and 5b, respectively.
61/103 High level information, such as the sampling rate, exact channel configuration, about the audio content contained is present in the audio data stream. This makes the continuous flow of data more self-contained and makes transporting the configuration and payload easier when integrated into transport schemes, which may not have the means to explicitly transmit this information.
The configuration structure contains a combined structure length and an SBR sample rate ratio (coreSbrFrarneLengthindex). This ensures efficient transmission of both values and ensures that non-significant combinations of frame length and SBR rate cannot be signaled. The latter simplifies the implementation of a decoder.
The configuration can be extended using a specific configuration extension mechanism. This will prevent the massive and inefficient transmission of configuration extensions as known from MPEG-4 AudioSpecificConfig ().
The configuration allows free signaling of the speaker positions associated with each audio channel transmitted. The channel signaling used for mapping the speakers can be efficiently signaled through a channelConfigurationindex.
The configuration of each channel element is contained in a separate structure in such a way that each channel element can be configured independently.
Configuration data of the SBR (the "SBRn header) is divided into an Sbrinfo () and an SbrHeader ().
SbrHeader (), a standard version is defined (SbrDfl tHeader ()), which can be efficiently referenced in the continuous data stream.
This reduces the bit demand in places where retransmission of SBR configuration data is required.
5 Configuration changes most commonly applied to the SBR can be efficiently flagged, with the help of the Sbrinfo () syntax element.
The configuration for the parametric bandwidth extension (SBR) and the parametric stereo encoding tools (MPS212, also known as MPEG Surround 2-1-2) are fully integrated into the USAC configuration framework.
This represents much better the way that both technologies are effectively employed in the standard.
The syntax has an extension mechanism that allows the transmission of existing and future extensions to the codec.
The extensions can be placed (that is, interspersed) with the channel elements in any order. This allows the extensions to be read before or after a specific channel element to which the extension should be applied.
A standard length can be defined by a syntax extension, which makes the transmission of constant length extensions very efficient, because the length of the payload extension does not have to be transmitted every time.
The most common case of signaling a value with the help of an escape mechanism to extend the range of values, if necessary, has been modularized into a specific genuine syntax element (escapedValue ()), which is flexible enough to cover all the desired escape value constellations, and the bit field extensions.
UsacConfig Continuous Flow Configuration () (Fig. 4a) UsacConfig () has been extended to contain 5 information about the audio content contained, as well as everything necessary for the complete configuration of the decoder. High-level information about the audio (sample rate, channel configuration, structure length output) is collected at the beginning to facilitate access by the higher (application) layers.
UsacChannelConfig () (Fig. 4b) These elements provide information about the elements of the contained data stream and their mapping to the speakers. The channelConfigurationindex allows an easy and convenient way of signaling one of a series of predefined mono, stereo or multi channel configurations that have been found to be practically relevant.
For more elaborate configurations that are not covered by the channelConfigurationindex the UsacChannelConfig () allows free assignment of elements to the speaker position from a list of 32 speaker positions, which cover all currently known speaker positions. in all known home speaker or cinema sound reproduction configurations.
This list of speaker positions is a superset of the list presented in the MPEG Surround Standard (see Table 1 and Figure 1 in ISO / IEC 23003-1). Four column positions have been added to cover the configuration
64/103 recently introduced of speakers 22, 2 (see Figures 3a, 3b, 4a and 4b).
UsacDecoderConfig () (Fig. 4c) This element is the center of the 5 decoder configuration and, as such, contains all other information needed by the decoder to interpret the continuous data flow.
Specifically, the structure of the data stream is defined here by explaining the number of elements and their order in the data stream.
A circuit over all elements then allows the configuration of all elements of all types (single, double, lfe, extension).
UsacConfigExtension () (Fig. 41) To consider future extensions, the configuration has a powerful mechanism to extend the configuration of configuration extensions that do not yet exist for USAC.
UsacSingleChannelElementConfig () (Fig. 4d) This element configuration contains all the information needed to configure the decoder to decode a single channel. This is essentially the information related to the core encoder and, if the SBR is used, the information related to the SBR.
UsacChannelPairElementConfig () (Fig. 4e) Similarly to the above, this element configuration contains all the information necessary to configure the decoder to decode a dual channel. In addition to the aforementioned core configuration and the SBR configuration, it includes specific stereo configurations, such as the exact type of stereo coding applied (with or without MPS212, residual, etc.). Note that this element covers all types of 5 stereo encoding available at USAC.
UsacLfeElementConfig () (Fig. 4f) The configuration of the LFE element does not contain configuration data as an LFE element has a static configuration.
UsacExtElementConfig () (Fig. 4k) This element configuration can be used for any type of existing or future extension configuration for the codec. Each type of extension element has its own specific ID value. A length field is included in order to conveniently ignore unknown configuration extensions for the decoder. The optional definition of a standard payload length further increases the payload extension encoding efficiency present in the actual data stream.
Extensions that are already planned to be combined with USAC include: MPEG Surround, SAOC, and some type of FIL element as it is known from MPEG-4 AAC.
UsacCoreConfig () (Fig. 4g) This element contains configuration data that impacts the configuration of the core encoder.
These are currently switches for the time warping tool and the noise fill tool.
SbrConfig () (Fig. 4h)
In order to reduce the overhead of bi ts produced by the frequent retransmission of sbr_header (), the default values for the elements of sbr_header (), which are normally kept constant, are now performed in the configuration element 5 SbrDfltHeader (). In addition, the static SBR configuration elements are also performed in SbrConfig (). These static bits include flags for enabling or disabling features specific to the advanced SBR, such as harmonic transposition or Inter TES.
SbrDfl tHeader () (Fig. 4 i) This takes the elements of the sbr_ header () that are typically kept constant. Elements that affect things like amplitude resolution, intersection band, the pre-leveling spectrum, are now performed in Sbrinfo (), which allows them to be efficiently changed on the trip.
Mps212Config () (Fig. 4 j) Similar to the SBR configuration above, all configuration parameters for the MPEG Surround 2- 1-2 tools are mounted in this configuration. All elements of SpatialSpecificConfig () that are not relevant or redundant in this context, have been removed.
UsacFrame Continuous Data Stream Payload () (Fig. 4n) This is the furthest wrapper around the OSAC payload data stream and represents an OSAC access unit. It contains a circuit over all the elements contained in the channel and the extension elements, as shown in the configuration part. This makes the stream format
67/103 data stream is much more flexible in terms of what it can contain, and is future proof to any future extent.
UsacSingleChannelElernent () (Fig. 4o) This element contains all the data for 5 to decode a mono stream. The content is divided into a part related to the core encoder and a part related to ESBR. The latter is now much more closely linked to the core, which also reflects much better the order in which data is needed by the decoder.
UsacChannelPairElernent () (Fig. 4p) This element comprises data in all possible ways to encode a double stereo.
Specifically, all types of unified stereo encoding are covered, ranging from encoding based on the M / S legacy to complete stereo parametric encoding with the help of MPEG Surround 2-1-2. StereoConfigindex indicates that the type is actually used. The appropriate eSBR data and the MPEG Surround 2-1-2 data are sent in this element.
UsacLfeElernent () (Fig. 4q) The old lfe channel element () was renamed just to follow a consistent naming scheme.
UsacExtElernent () (Fig. 4r) the extension element has been carefully designed to be able to be maximally flexible, but at the same time, maximally effective, even for extensions that have a small (or often none) payload. The extension payload length is signaled for foolish decoders to ignore it. User-defined extensions can be signaled using a range of reserved extension types.
The extensions can be placed freely in the order of the elements. A range of extension elements has already been considered, including a mechanism for writing padding bytes.
5 UsacCoreCoderData () (Fig. 4s) This new element summarizes all the information that affects the core encoders and therefore also contains the information from the fd channel stream () and lpd_channel stream ().
StereoCoreToolinfo () (Fig. 4 t) To facilitate the syntax readability, all information related to the stereo was captured in this element. It handles the numerous bit dependencies in stereo encoding modes.
UsacSbrData () (Fig. 4x) The functionality and CRC legacy of scalable audio encoding descriptive elements were taken from what used to be the sbr extension_data () element. In order to reduce the overhead caused by the frequent forwarding of SBR information and header data, their presence can be explicitly flagged.
Sbrinfo () (Fig. 4y) SBR configuration data that is frequently changed on the trip. This includes elements that control things like amplitude resolution, intersection band, pre-leveling spectrum, which previously required the transmission of a full sbr_header (). (see 6. 3 in [Nl 1660 J, "Efficiency") SbrHeader () (Fig. 4 z) In order to maintain the SBR's ability to change the values in sbr header () in operation, it is now possible to make an SbrHeader () inside UsacSbrData (), in case other values than those sent in SbrDfltHeader (), have to be used. The bs_header_extra mechanism was maintained in order to keep the overhead as low as possible for the most common cases.
sbr_data () (Fig. 4za) Again, scalable remnants of SBR encoding have been removed because they are not applicable in the USAC context.
Depending on the number of channels, sbr data () contains a single channel element () or a channel_pair_element () sbr.
usacSamplingFrequencyindex This table is a superset of the table used in MPEG-4 to indicate the sampling frequency of the audio codec. The table has been extended to also cover the sample rates that are currently used in USAC modes of operation. Some multiples of the sampling frequencies have also been added.
channelConfigurationindex This table is a superset of the table used in MPEG-4 to signal channelConfiguration. It has been extended to allow signaling of future installations seen from commonly used speakers. The index in this table is flagged with 5 bits to allow for future extensions.
usacElementType There are only 4 types of elements. One for each of the four basic data stream elements: UsacSingleChannelElement (), UsacChannelPairElement (), UsacLfeElement (), UsacExtElement (). These elements provide the necessary high-level structure, while maintaining all the necessary flexibility.
usacExtElementType Within UsacExtElement (), this element allows the signaling of a multiplicity of extensions. In order to be future proof, the bit field large enough to allow all possible extensions was chosen. Out of the currently known extensions, few already propose to be considered: filling element, MPEG Surround, and SAOC.
usacConfigExtType If at any point it is necessary to extend the configuration, then it can be handled using UsacConfigExtension () which would then allow you to assign a type to each new configuration. Currently, the only type that can be flagged is a filling mechanism for the configuration.
coreSbrFrameLengthindex This table should signal multiple decoder configuration aspects. Specifically, these are the output frame length, the SBR range and the resulting core encoder (CCFL) frame length. At the same time, it indicates the QMF analysis number and the synthesis bands used in SBR stereoConfigindex This table determines the internal structure of a UsacChannelPairElement (). Indicates the use of a mono or stereo core, the use of MPS ~ 12, whether the stereo SBR is applied, and whether residual encoding is applied in MPS212.
When moving large parts of the header fields
ESBR for a standard header that can be referenced using a standard header flag, the bit demand for sending ESBR control data has been greatly reduced. Old sbr_header () bit fields that were considered for a 5 most likely change in a real world system have been outsourced to the sbrinfo () element since it now consists of only four elements that cover a maximum of 8 bits.
Compared to sbr_header (), which consists of at least 18 bits, this represents a savings of 10 bits.
It is more difficult to assess the impact of this change on the total bit rate because it depends a lot on the data transmission rate of the ESBR control in sbrinfo (). However, for the common use case where the SBR intersection is changed in a continuous data stream, the saving of bi ts can be as high as 22 bits per occurrence when sending a sbrinfo () instead of a sbr_header () fully transmissible.
The USAC decoder output can be further processed by MPEG Surround (MPS) (ISO / IEC 23003-1), or SAOC (ISO / IEC 23003-2). If the SBR tool in USAC is active, a USAC decoder can typically be efficiently combined with a later MPS / SAOC decoder, connecting them to the QMF domain in the same way as described for HE-AAC in ISO / IEC 23003-1 4.4. If a connection in the QMF domain is not possible, they must be connected in the time domain.
If the MPS / SAOC side information is inserted in a continuous stream of USAC data through the usacExtElernent mechanism (with usacExtElernentType being ID EXT ELE MPEGS or ID_EXT_ELE SAOC), the alignment time between USAC data and MPS / SAOC data assumes the most efficient connection between the USAC decoder and the MPS / SAOC decoder. If the SER tool in the USAC is active, and if the MPS / SAOC employs a representation of the QMF 64 band domain (see ISO / IEC 23003-1 6.6.3), the most efficient connection 5 is in the QMF domain. Otherwise, the most efficient connection is in the time domain. This corresponds to the alignment time for the combination of HE-AAC and MPS, as defined in ISO / IEC 23003-1 4.4, 4.5 and 7.2.1.
The additional delay introduced by adding MPS decoding after USAC decoding is given by the ISO / IEC 23003-1 4.5 standard and depends on whether either HQ MPS or LP MPS is used, and whether MPS is connected with USAC on the QMF domain or time domain.
The ISO / IEC 23003-1 4.4 standard clarifies the interface between the USAC and MPEG systems. Each access unit delivered to the audio decoder from the systems interface must result in a corresponding composition unit delivered from the audio decoder to the systems interface, that is, the composer. This will include the initialization and shutdown conditions, that is, when the access unit is the first or the last of a finite sequence of access units.
For an audio composition unit, ISO / IEC 14496-1 7. 1. 3. 5 the Time Stamp Composition [CTS Composition Time Stamp] specifies that the composition time applies to the n-th audio sample within composition unit.
For USAC, the den value is always 1. Note that this applies to the output of the USAC decoder itself. In the event that a USAC decoder, for example, is being combined with an MPS decoder, it needs to be taken into account for the composition units delivered at the output of the MPS decoder.
If the MPS / SAOC side information is integrated into a continuous flow of USAC data through the usacExtElement 5 mechanism (with usacExtElementType being ID EXT ELE MPEGS or ID_EXT ELE SAOC) I the following restrictions can optionally be applied: O parameter MPS / SAOC sacTimeAlign (see ISO / IEC 23003-1 7.2.5) must have the value of O.
The sampling frequency of the MPS / SAOC must be the same as the sampling frequency of the USAC.
The parameter MPS / SAOC bsFrameLength (see ISO / IEC 23003-1 5.2) must have one of the allowed values in the predetermined list.
The USAC payload data flow syntax is shown in Figures 4n to 4r, and the payload auxiliary element syntax shown in Figures 4s-w and the enhanced SBR payload syntax is shown in Figures 4x to 4zc.
Brief Description of the Data Elements UsacConfig () This element contains information about the audio content contained, as well as everything necessary for the complete configuration of the decoder.
UsacChannelConfig () This element provides information about the elements of the contained data stream and their mapping to the speakers.
UsacDecoderConfig () This element contains all the information needed by the decoder to interpret the continuous flow of data. Specifically, the relationship of the SBR resampling is signaled here, and the structure of the continuous data flow is defined here, explaining the number of elements and their order in the continuous data flow.
UsacConfigExtension () The configuration extension mechanism to extend the configuration to future configuration extensions for OSAC.
UsacSingleChannelElementConfig () Contains all the information needed to configure the decoder to decode a single channel. This is essentially the information related to the core encoder and, if the SBR is used, the information related to the SBR.
UsacChannelPairElementConfig () Similarly to the above, this element configuration contains all the information needed to configure the decoder to decode a dual channel. In addition to the aforementioned core configuration and the sbr configuration, it includes specific stereo configurations such as the exact type of stereo encoding applied (with or without MPS212, residual, etc.) This element covers all types of stereo encoding options available at OSAC .
UsacLfeElementConfig () The configuration of the LFE element does not contain configuration data, as an LFE element has a static configuration.
UsacExtElementConfig () This element configuration can be used for any type of existing or future extension configuration for the codec. Each type of extension element has its own specific type value.
The length field is included in order to be able to ignore unknown configuration extensions for the decoder.
UsacCoreConfig () Contains configuration data that impacts the configuration of the core encoder.
SbrConfig () Contains the default values for the 5 sSRR configuration elements that are typically kept constant. In addition, the static SBR configuration elements are also performed in SbrConfig (). These static bits include flags for enabling or disabling features specific to the advanced SBR, such as harmonic transposition or Inter TES.
SbrDfltHeader () This element performs a standard version of the elements of SbrHeader () that can be referred to if no differentiation value for these elements is desired.
Mps212Config () All configuration parameters for the MPEG Surround 2-1-2 tools are mounted in this configuration.
escapedValue () This element implements a general method for transmitting an integral value using a varying number of bits. It has a two-level escape mechanism, which allows to expand the range of values that can be represented by the successive transmission of additional bits.
usacSamplingFrequencyindex This index determines the sampling frequency of the audio signal after decoding. The value of usacSamplingFrequencyindex and its associated sample frequencies are described in Table C.
UsacSamplingFrequencyindex Table and Values and Meanings usacSamplingFrequencyindex Sampling frequency Ox O O 96000 OxOl 88200 02 64000
Ox03 48000 Ox04 44100 Ox05 32000 Ox06 24000 Ox07 22050 Ox08 16000 Ox09 12000 Ox O to 11025 Ox Ob 8000 OxOc 7350 Ox O reserved Ox O is reserved OxOf 57600 OxlO 51200 Oxll 40000 Ox12 38400 Ox13 34150 Ox14 28800 Ox15 25600 Ox16 20000 Ox17 19200 Ox18 17075 Ox19 14400 Oxla 12800 Oxlb 9600 Reserved Oxld Reserved Oxle Reserved Oxlf Exhaust value NOTE: The values of UsacSamplingFrequencyindex OxOO to OxOOe are identical to those of samplingFrequencyindex OxO to Oxe contained in the AudioSpecificConfig () specified in ISO / IEC 14496-3: 2009 usacSamplingFrequency Sampling frequency of the decoder encoded as an integrated unsigned value in the event that the usacSamplingFrequencyindex is equal to zero.
channelConfigurationindex This index determines the 5 channel configuration. If channelConfigurationindex> O, the index unambiguously defines the number of channels, the elements of the channel and the mapping of the associated speaker according to Table Y.
The names of the speaker positions, the abbreviations used and the general position of the available speakers can be deduced from the Figures. 3a, 3b and Figures 4a and 4b.
bsOutputChannelPos This index describes the positions of the speakers that are associated with a given channel according to Table XX. Figure Y indicates the position of the speaker in the listener's 3D environment. In order to facilitate the understanding of the speaker positions, Table XX also contains the speaker positions in accordance with 5 IEC 100/1706 / CDV which are listed here as information for the interested reader.
Table values of coreCoderFrameLength, sbrRatio, outputFrameLength and numSlots dependent on coreSbrFrameLengthindex In dice coreCoder- sbrRatio output- Mps212 FrameLength (sbrRatioindex) FrameLength numSlots o 768 No SBR (0) 7 68 NA 1 1024 No SBR (O) : 3 (2) 2048 32 3 1024 2: 1 (3) 2048 32 4 1024 4: 1 (1) 4096 64 5-7 reserved usacConfigExtensionPresent Indicates the presence of extensions for the configuration.
numOutChannels If the value of channelConfigurationindex indicates that none of the predefined channel settings are used, then this element determines the number of audio channels to which a specific speaker position should be associated.
numElements This field contains the number of elements that will follow the circuit in the types of elements in UsacDecoderConfig ().
usacElementType [elemidx] Defines the element type of the element's USAC channel at the elemidx position in the data stream. There are four types of elements, one for each of the four basic data streaming elements: UsacSingleChannelElement (), UsacChannelPairElement (),
OsacLfeElement (), OsacExtElement (). These elements provide the necessary high-level structure, while maintaining all the necessary flexibility. The meaning of usacElementType is defined in Table A.
5 Table A - usacElementType value usacElementType Value USAC SCE ID or USAC CPE ID 1 USAC LFE 2 ID USAC EXT 3 stereoConfigindex This element determines the internal structure of an OsacChannelPairElement (). Indicates the use of a mono or stereo core, the use of MPS212, whether the stereo SBR is applied, and whether residual encoding is applied in MPS212 according to Table ZZ. This element also defines the values of the auxiliary elements bsStereoSbr and bsResidualCoding.
Table ZZ - StereoConfigindex values and their meaning and implicit designation of bsStereoSbr and bsResidualCoding stereoConfigindex meaning bsStereoSbr bsResidualCoding o Regular CPE (none N / A o MPS212) 1 Simple channel + MPS212 N / A o 2 Dual channel + MPS212 o 1 3 Dual channel + MPS212 o 1 3 1 1 tw mdct This indicator signals the use of MDCT stopped in time in this flow.
noiseFilling This indicator signals the use of noise that fills the spectral holes in the central encoder FD.
harmonicSBR This indicator signals the use of harmonic repair for SBR.
bs interTes This indicator signals the use of the intra-TES tool in SBR.
dflt_start_freq This is the default value for the bs start freq data stream element, which is applied in case the sbrUseDfltHeader indicator indicates that the default values for SbrHeader () elements must be assumed.
5 dflt_stop_freq This is the default value for the bs stop freq data stream element, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the SbrHeader () elements should be assumed.
dflt header extral This is the default value for the bs header extral data stream element, which is applied in case the sbrUseDfltHeader indicator indicates what default values for the SbrHeader () elements should be assumed.
dflt header extra2 This is the default value for the bs_header_extra2 data stream element, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the SbrHeader () elements should be assumed.
dflt_freq_scale This is the default value for the bs freq_scale data stream element, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the SbrHeader () elements should be assumed.
dflt alter_scale This is the default value for the bs_alter_scale data stream element, which is applied in case the sbrUseDfltHeader indicator indicates what default values for the SbrHeader () elements should be assumed.
dflt noise bands This is the default value for the bs noise_bands data stream element, which is applied in case the sbrUseDfltHeader indicator indicates what default values for the SbrHeader () elements should be assumed.
dflt limiter bands This is the default value for the bs limiter_bands data stream element, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the SbrHeader () elements should be assumed.
5 dflt_limiter_gains This is the default value for the data stream element bs limiter gains, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the elements SbrHeader () should be assumed.
dflt_interpol_freq This is the default value for the data stream element bs interpol freq, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the elements SbrHeader () should be assumed.
dflt_smoothing_mode This is the default value for the data stream element bs smoothing_mode, which is applied in case the sbrUseDfltHeader indicator indicates which default values for the elements SbrHeader () should be assumed.
usacExtElementType This element allows to signal the types of extensions of the data stream. The meaning of usacExtElementType is defined in Table B.
Table B - usacExtElementType value usacExtElementType Value ID- EXT- ELE- FILL o ID- EXT- ELE- MPEGS 1 ID- EXT- ELE- SAOC 2 / * reserved for use by ISO * / 3-127 / * reserved for use outside from the scope of ISO * / 128 and higher NOTE: Application-specific values usacExtElementType must be placed in the space reserved for use outside the scope of ISO. These are skipped by a decoder as a minimum of the structure is needed by the decoder to skip these extensions.
usacExtElementConfigLength signals the length of the extension configuration in bytes (octets).
usacExtElementDefaultLengthPresent This indicator signals whether a usacExtElernentDefaultLength is conducted at UsacExtElernentConfig () usacExtElementDefaultLength signals the 5 standard length of the extension element in bytes. Only if the extension element in a given access unit deviates from this value, does an additional length need to be transmitted in the continuous data stream. If this element is not explicitly passed usacExtElementDefaultLengthPresent == O) then the value of usacExtElementDefaultLength must be set to zero.
usacExtElementPayloadFrag This indicator indicates whether the payload of this extension element can be fragmented and sent as several segments in consecutive USAC structures.
numConfigExtensions If the configuration extensions are present in UsacConfig () this value indicates the number of configuration extensions flagged.
confExtidx Index for configuration extensions.
usacConfigExtType This element allows to signal the types of configuration extension. The meaning of usacExtElementType is defined in Table D.
Table D - usacConfigExtType value usacConfigExtType Value ID CONFIG EXT FILL o / * reserved for use ISO * / 1-127 / * reserved for use outside the scope of 128 and more ISO * / high usacConfigExtLength signals the length of the configuration extension in bytes (octets).
bsPseudoLr This indicator signals that an inverse lateral / average rotation must be applied to the central signal before Mps212 processing.
Table - bsPseudoLr bsPseudoLr Meaning o The output of the central decoder is OMX / RES 1 The output of the central decoder is Pseudo L / R bsStereoSbr This indicator signals the use of stereo SBR 5 in combination with MPEG Sound decoding.
Table - bsStereoSbr bsStereoSbr Meaning o Mono SBR 1 Stereo SBR bsResidualCoding indicates whether residual coding is applied according to the Table below. the value of bsResidualCoding is defined by stereoConfigindex (see X).
Table X - bsResidualCoding bsResidualCoding Meaning o no residual encoding, central encoder is mono 1 residual encoding, central encoder is stereo sbrRatioindex indicates the relationship between the central sample rate and the sample rate after eSBR processing. At the same time it indicates the number of the QMF analysis and synthesis ranges used in SBR according to the Table below.
Table - Definition of sbrRatioindex QMF range relation sbrRatioindex sbrRatio (analysis: synthesis) o in SBR 1 4: 1 16:64 2 8: 3 24:64 3 2: 1 32:64 elemidx Index for the elements present in UsacDecoderConfig () and UsacFrame () UsacConfig () UsacConfig () contains information about the selected sampling frequency and channel configuration. This information must be identical to the information signaled outside this element, for example, in an MPEG-4 AudioSpecificConfig ().
Usac 5 Output Sampling Frequency If the sampling rate is not one of the rates listed in the right column in Table 1, the tables dependent on the sampling frequency (code tables, scale factor range tables, etc.) should be deducted so that the payload of the continuous data stream is analyzed. Since a given sampling frequency is associated with only one sampling frequency table, and since maximum flexibility is desired in the range of possible sampling frequencies, the following table should be understood to associate an uncomplicated sampling frequency with the tables dependent on the desired sampling frequency.
Table 1 - Mapping the sampling frequency Frequency range Use tables for sampling frequency (in Hz) (in Hz) f> = 92017 96000 92017> f> = 75132 88200 75132> f> = 55426 64000 55426> f> = 46009 48000 46009> f> = 37566 44100 37566> f> = 27713 32000 27713> f> = 23004 24000 23004> f> = 18783 22050 18783> f> = 13856 16000 13856> f> = 11502 12000 11502> f> = 9391 11025 9391> f 8000 UsacChannelConfig () The channel configuration table covers most of the most common speaker positions. For more flexibility, channels can be mapped across the entire selection of 32 speaker positions found in modern speaker configurations in various applications (see figures 3a, 3b) For each channel contained in the continuous data stream, the UsacCanalConfig () specifies the associated speaker position on which this particular channel is to be mapped. The 5 speaker positions that are indexed by bsOutputChannelPos are listed in Table X. In the case of several channel elements, the index bsOutputChannelPos [i] indicates the position in which the channel appears in the continuous data stream. Figure Y provides an overview of the speaker's positions with respect to the listener.
More precisely, the channels are numbered in the sequence in which they appear in the continuous data stream starting with (zero). In the common case of a UsacSingleCanalElement () or UsacLfeElement () the channel number is assigned to this channel and the channel account is raised by one. In the case of a UsacChannelPairElement () the first channel in this element (with index ch == O) is listed first, where the second channel in this same element (with index ch == l) receives the next highest number and the channel account is elevated by two.
It follows that numOutChannels must be equal to or greater than the cumulative sum of all channels contained in the continuous data stream. The accumulated sum of all channels is equivalent to the number of all UsacSingleChannelElement () plus the number of all UsacLfeElement () plus twice the number of all UsacChannelPairElement () 's.
All entries in the bsOutputChannelPos array must be mutually distinct to avoid double assignment of speaker positions in the continuous data stream.
In the special case that channelConfigurationindex is
E numOutChannels is less than the cumulative sum of all channels contained in the continuous data stream, so the management of unassigned channels is outside the scope of these specifications. Information about this, for example, can be carried by the appropriate medium in the highest application layers or specifically by the drawn extension payloads (particular).
UsacDecodificadorConfig () UsacDecoderConfig () contains all the information needed by the decoder to interpret the continuous data flow. First, the value of sbrRatioindex determines the index between the length of the central encoder structure (ccfl) and the length of the output structure. Following sbrRatioindex is a circuit over all channel elements in the current data stream. For each iteration, the type of element is signaled in usacElementType [], immediately followed by its corresponding configuration structure. The order in which the various elements are present in UsacDecoderConfig () must be identical to the order of the corresponding payload in UsacFrame ().
Each moment of an element can be configured independently. The read each element of the channel in UsacFrame (), for each element the corresponding configuration of this moment, that is, with the same elemidx, must be used.
UsacSingleChannelElementConfig () UsacSingleChannelElementConfig () contains all the information needed to configure the decoder to decode a single channel. The SBR configuration data is only transmitted if SBR is actually used.
UsacChannelPairElementConfig () UsacChannelPairElementConfig () contains configuration data related to the central encoder as well as SBR configuration data dependent on the use of SBR. The exact type of 5 stereo encoding algorithm is indicated by stereoConfigindex.
In USAC a pair of channels can be encoded in several ways.
These are: l. Pair of the central stereo encoder using traditional united stereo encoding techniques, extended by the possibility of complex prediction in the MDCT domain.
2. Central channel of mono encoder in combination with MPEG Surround based on MPS212 for full parametric stereo encoding. Mono SBR processing is applied to the central signal.
3. Stereo central encoder pair in combination with MPEG Surround based on MPS212, where the first channel of the central encoder carries a downmix signal and the second channel carries a residual signal. The residual can be limited by the range to perceive partial residual coding.
Mono SBR processing is applied only to the downmix signal before MPS212 processing.
4. Stereo central encoder pair in combination with MPEG Surround based on MPS212, where the first channel of the central encoder carries a downmix signal and the second channel carries a residual signal. The residual can be limited by the range to perceive partial residual coding.
Stereo SBR is applied to the reconstructed stereo signal after MPS212 processing.
Option 3 and 4 can also be combined with a rotation of the LR pseudo channel after the central decoder.
UsacLfeElementConfig () Since the use of MDCT with time stopped and 5 noise filling is not allowed for LFE channels, there is no need to transmit the common central encoder indicator for these tools. They must be set to zero.
In addition, the use of SBR is neither permitted nor significant in an LFE context. Thus, the SBR configuration data is not transmitted.
UsacCoreConfig () UsacCoreConfig () contains only indicators to enable or disable the use of MDCT with downtime and filling in spectral noise at a global level of the data stream. If tw mdct is set to zero, the stopped time should not be applied. If noiseFilling is set to zero, spectral noise padding should not be applied.
SbrConfig () The SbrConfig () data stream element serves the purpose of · signaling the exact eSBR configuration parameters. On the one hand, SbrConfig () signals the general use of eSBR tools. On the other hand, it contains a standard version of SbrHeader (), SbrDfltHeader (). The values of this standard header must be assumed if no different SbrHeader () is transmitted in the data stream. The history of this mechanism is that, typically, only a set of values of SbrHeader () is applied in a continuous flow of data. The transmission of SbrDfltHeader (), then, allows referring to this standard set of values very efficiently using only one bit in the continuous data stream. The possibility of varying the immediate SbrHeader values is still retained, allowing transmission in the range of a new SbrHeader in the continuous data stream itself.
SbrDfl tHeader () The SbrDfltHeader () is what can be called the basic model of SbrHeader () and must contain the values for the predominantly used eSBR configuration. In the continuous flow of data this configuration can be referred to by adjusting the sbrUseDfltHeader indicator. The structure of SbrDfltHeader () is identical to that of SbrHeader (). In order to distinguish between the values of SbrDfltHeader () and SbrHeader (), the bit fields in SbrDfltHeader () are pre-fixed with "dflt" instead of "bs" If the use of SbrDfltHeader () is indicated, then the fields bit SbrHeader () must assume the values of the corresponding SbrDfltHeader (), that is, bs start freq = dflt start freq; bs stop_freq dflt stop freq; etc.
(continue for all elements in SbrHeader (), like: bs_xxx_yyy = dflt_xxx yyy; Mps212Config () Mps212Config () looks like MPEG Surround's SpatialSpecificConfig () and was in large parts deduced from it. It is however reduced in length to contain the information relevant for mono to stereo upmixing in the context of USAC.
MPS212 configures only one OTT box.
UsacExtElementConfig () UsacExtElementConfig () is a general container for configuration data for USAC extension elements.
5 Each USAC extension has a unique type identifier, usacExtElementType, which is defined in Table X. For each UsacExtElementConfig () the length of the extension configuration contained is passed in the variable usacExtElementConfigLength and allows decoding to safely skip over the extension elements whose usacExtElementType is unknown.
For USAC extensions that typically have a constant payload length, UsacExtElementConfig () allows the transmission of a usacExtElementDefaul tLength. Setting a default payload length in the configuration allows for highly efficient signaling of usacExtElementPayloadLength within UsacExtElement (), where bit consumption needs to be kept low.
In the case of USAC extensions where a larger amount of data is accumulated and transmitted not on a per-structure basis, but only on every second structure or even more rarely, these data can be transmitted in fragments or segments over various USAC structures. This can be useful to keep the bit reservoir more equalized. The use of this mechanism is signaled by the usacExtElementPayloadFrag indicator. The fragmentation mechanism is further explained in the description of usacExtElement in 6.2.X.
UsacConfigExtension () UsacConfigExtension () is a general container for UsacConfig () extensions. It provides a convenient way to amend or extend the information exchanged during the decoder initialization or configuration period. The presence of configuration extensions is indicated by 5 UsacConfigExtensionPresent. If configuration extensions are present (UsacConfigExtensionPresent == l), the exact number of these extensions follows in the numConfigExtensions bit field. Each configuration extension has a unique type identifier, usacConfigExtType, which is defined in Table X. For each UsacConfigExtension the length of the configuration extension contained is transmitted in the variable usacConfigExtLength and allows the continuous flow of configuration data to be analyzed to safely skip the extension of the configuration where usacConfigExtType is unknown.
Top-level payloads for the USAC audio object type UsacFrame Terms and Definitions () This data block contains audio data for a period of time from a USAC structure, related information and other data. As signaled in UsacDecoderConfig (), UsacFrame () contains numElements elements.
These elements can contain audio data, for one or two channels, audio data for low frequency improvement or extension payload.
üsacSingleChannelElement () Abbreviation SCE.
Syntactic element of the data stream containing encoded data for a single audio channel. The single channel element () basically consists of
UsacCoreCoderData (), containing data for both the central encoder FD and LPD. In the case that SBR is active, UsacSingleChannelElement also contains SBR data.
UsacChannelPairElement () Abbreviation CPE. Syntactic element 5 of the payload of the continuous data stream containing data for a pair of channels. The channel pair can be obtained by transmitting two discrete channels or by a discrete channel related to the Mps212 payload. This is signaled by means of stereoConfigindex. UsacChannelPairElement still contains SBR data in case SBR is active.
UsacLfeElement () Abbreviation LFE. Syntactic element containing a low frequency sampling improvement channel. LFEs are always encoded using the fd channel stream () element.
UsacExtElement () Syntactic element that contains an extension payload. The length of an extension element is flagged as a standard length in the configuration (USACExtElementConfig ()) or flagged in UsacExtElement () itself. If present, the extension payload is of the usacExtElementType type, as signaled in the configuration.
usacindependencyFlag Indicates whether the current UsacFrame () can be decoded completely without knowing the information from the previous structures according to the Table below.
Table - Meaning of usacindependencyFlag value of Meaning usacindependencyFlag The data decoding performed on UsacFrame () may require you to access the previous UsacFrame (). 1 Data decoding conducted on UsacFrame () is possible without accessing the previous UsacFrame ().
NOTE: Please consult X.Y for recommendations on using usacindependencyFlag.
usacExtElementUseDefaultLength Indicates whether the length of the extension element matches 5 usacExtElementDefaultLength, which was defined in UsacExtElementConfig ().
usacExtElementPayloadLength Must contain the length of the extension element in bytes. This value should only be explicitly transmitted in the data stream if the length of the extension element in the present access unit deviates from the default value, usacExtElementDefaultLength.
usacExtElementStart Indicates whether the present usacExtElementSegmentData starts a data block.
usacExtElementStop Indicates whether the present usacExtElementSegmentData ends a block of data.
usacExtElementSegmentData The concatenation of all usacExtElementSegmentData from UsacExtElement () of consecutive USAC structures, starting from UsacExtElement () with usacExtElementStart == l and including UsacExtElement () with usacExtElementStop == l forms a data block. In case a complete data block is contained in a UsacExtElement (), usacExtElementStart and usacExtElementStop must be set to 1.
The data blocks are interpreted as an extension payload aligned with the byte depending on the usacExtElementType according to the following Table: Table - Interpretation of the data blocks for decoding the USAC extension payload usacExtElementType The concatenated usacExtElementSegmentData represents: ID- EXT- ELE- FIL Fill_byte series ID- EXT- ELE- MPEGS SpatialFrame () ID- EXT- ELE- SAOC SaocFrame () unknown Unknown data. The data block must be discarded.
fill_byte Bi ts octet that can be used to fill the data stream with bits that carry no information. The exact bit pattern used for fill_byte 5 must be '10100101'.
Help Elements nrCoreCoderChannels In the context of a channel pair element, this variable indicates the number of channels of the central encoder that form the basis for stereo encoding.
Depending on the stereoConfigindex value this value must be 1 or
two.
nrSbrChannels In the context of a channel pair element, this variable indicates the number of channels on which SBR processing is applied. Depending on the stereoConfigindex value, this value must be 1 or 2.
Subsidiary payloads for USAC Terms and Definitions UsacCoreCoderData () This data block contains the audio data from the central encoder. The payload element contains for one or two channels of the central encoder, both for FD and LPD mode. The specific mode is signaled by channel at the beginning of the element.
StereoCoreToolinfo () All stereo related information is captured in this element. Handles the various bit field dependencies in stereo encoding modes.
Help Elements commonCoreMode In a CPE this indicator indicates whether both channels of the coded central encoder use the same 5 mode.
Mps212Data () This data block contains payload for the Mps212 stereo module. The presence of this data is dependent on stereoConfigindex.
common window Indicates whether channel O and channel 1 of a CPE use identical window parameters.
common tw Indicates whether channel O and channel 1 of a CPE uses the identical parameters for MDCT with time stopped.
UsacFrame () decoding Om OsacFrame () forms an access unit of the OSAC data stream. Each OsacFrame decodes in the output samples 768, 1024, 2048 or 4096 according to the output- FrameLength determined from Table X.
the first bit in OsacFrame () is usacindependencyFlag, which determines whether a given structure can be decoded without any knowledge of the previous structure. If usacindependencyFlag is set to O, then dependencies on the previous structure may be present in the payload of the current structure.
OsacFrame () is additionally made up of one or more syntactic elements that must appear in the continuous data stream in the same order as their corresponding configuration elements in OsacDecoderConfig (). The position of each element in the series of all elements is indexed by elemidx. For each element the corresponding configuration, as transmitted in UsacDecoderConfig (), from this moment, that is, with the same elemidx, must be used.
These syntactic elements are of one of the four 5 types, which are listed in Table X. The type of each of these elements is determined by usacElementType. There can be several elements of the same type. Elements that occur in the same elemidx position in different structures must belong to the same flow.
Table Examples of possible simple payloads of continuous data streams numElements elemidx usacElementType [elemidx] Mono output signals 1 o USAC SCE ID Stereo output signals 1 o USAC ID CPE o USAC SCE ID
5.1 output signal from 1 USAC CPE ID 4 channel 2 USAC CPE ID 3 USAC LFE ID If these payloads of the continuous data stream are to be transmitted over a constant rate channel then they may include an element of the extension payload with a usacExtElementType of ID_EXT ELE FILL to adjust the instant bit rate. In this case, an example of an encoded stereo signal is: Table Examples of continuous stereo data stream with payload extension to record the fill bits. numElements elemidx usacElementType [elemidx] the ID USAC CPE ID- USAC- EXT Output signal 2 with stereo 1 usacExtElementType ==
EXT ID ELE FILL UsacSingleChannelElement () decoding The simple UsacSingleChannelElement () structure
it is made up of a moment of an OsacCoreCoderData () element with nrCoreCoderChannels set to 1. Depending on the sbrRatioindex of this element an OsacSbrData () element follows with nrSbrChannels set to 1 as well.
5 UsacExtElernent () decoding The OsacExtElement () structures in a continuous data stream can be decoded or skipped by an OSAC decoder. Each extension is identified by a usacExtElementType, conducted in OsacExtElement () associated with OsacExtElementConfig (). For each usacExtElementType a specific decoder can be present.
If a decoder for the extension is available to the OSAC decoder then the payload of the extension is forwarded to the extension decoder right after the OsacExtElement () is analyzed by the OSAC decoder.
If no decoder for the extension is available to the OSAC decoder, a minimum of the structure is provided within the continuous data stream, so that the extension can be ignored by the OSAC decoder.
The length of an extension element is specified by a standard length in octets, which can be flagged within the corresponding OsacExtElementConfig () and which can be mastered in OsacExtElement (), or by length information explicitly provided in OsacExtElement (), which is one or three octets, using the escapedValue () syntactic element.
Extension payloads that span one or more OsacFrame () can be fragmented and their payload can be distributed among the various OsacFrame (). In this case the usacExtElementPayloadFrag indicator is set to 1 and a decoder must collect all fragments of UsacFrame () with usacExtElementStart set to 1 and including UsacFrame () with usacExtElementStop set to 1. When usacExtElementStop is set to 1 then the 5 extension is considered complete and is passed to the extension decoder.
Note that integrity protection for a fragmented extension payload is not provided by this specification and other means must be used to ensure the integrity of extension payloads.
Note that all extension payload data is assumed to be aligned by byte.
Each UsacExtElement () must comply with the requirements resulting from the use of the usacindependencyFlag. Put more explicitly, if usacindependencyFlag is defined (== l) UsacExtElement () must be decoded without knowledge of the previous structure (and the extension payload that can be contained in it).
Decoding Process stereoConfigindex, which is transmitted in UsacChannelPairElementConfig (), determines the exact type of stereo encoding that is applied to the given CPE. Depending on this type of stereo encoding, either one or two channels of the central encoder are actually transmitted in the continuous data stream and the nrCoreCoderChannels variable needs to be set correctly. The UsacCoreCoderData () syntax element then provides the data for one or two channels of the central encoder.
Similarly, data may be available for one or two channels depending on the type of stereo encoding and the use of eSBR (ie, if sbrRatioindex> O). the value of nrSbrChannels needs to be set correctly and the UsacSbrData () syntax element provides eSBR data for one or two channels.
5 Finally Mps212Data () is transmitted, depending on the value of stereoConfigindex.
Low frequency enhancement channel element (LFE I iow frequency enhancement), UsacLfeElement () General To maintain a smooth structure in the decoder, UsacLfeElement () is defined as a standard fd channel stream element (0,0,0,0, x), that is, it is the same as a UsacCoreCoderData () using the frequency domain encoder. Thus, decoding can be done using the standard procedure to decode a UsacCoreCoderData () - element.
To accommodate a more efficient bit rate and hardware implementation of the LFE decoder, however, several restrictions apply to the options used for encoding this element: The window_sequence field is always set to O (ONLY_LONG_sequence); Only the lowest 24 spectral coefficients of any LFE can be non-zero; No Temporal Noise Form is used, that is, tns_data_present to define the O; The distorted time is not active; No noise padding is applied.
UsacCoreCoderData () UsacCoreCoderData () contains all the information to decode one or two channels of the central encoder.
The decoding order is: 5 Get the core_mode [] for each channel; In the case of two central coded channels (nrChannels == 2), analyze StereoCoreToolinfo () and determine all parameters related to the stereo; Depending on the signaled core modes, transmit a lpd_channel_stream () or a fd_channel_stream () to each channel.
As can be seen from the list above, decoding a channel from the central encoder (nrChannels == l) results in obtaining the core mode bit followed by an lpd_channel_stream or fd_channel_stream, depending on the core_mode.
In the case of two channels of the central encoder, some signaling redundancies between the channels can be explained in particular if the core mode of both channels is O.
See 6.2.X (StereoCoreToolinfo () decoding) for more details.
StereoCoreToolinfo () StereoCoreToolinfo () allows you to efficiently encode parameters, whose values can be shared by the central coder channels of a CPE in case both channels are encoded in FD mode (core_mode [0.1] == 0) In particular, the following data elements are shared when the appropriate indicator in the data stream is set to 1.
Table Element of the data stream shared by the channels of a central encoder channel pair The common xxx indicator is defined for channels O and 1 share the following for 1 elements: common window ics info () common window && common max sfb max sfb common tw tw_data () common tns tns data () If the appropriate indicator is not defined then the data elements are transmitted individually for each 5 channel of the central encoder both in StereoCoreToolinfo () (max_sfb, max_sfbl) and in fd_channel_stream () which follows the StereoCoreToolinfo () in the UsacCoreCoderData () element.
In the case of common window == l StereoCoreToolinfo () also contains information about stereo M / S encoding and complex forecast data in the MDCT domain (see 7.7.2).
UsacSbrData () This data block contains the payload for extending the SBR bandwidth to one or two channels. The presence of these data is dependent on sbrRatioindex.
Sbrinfo () This element contains SBR control parameters that do not require a readjustment of the decoder when changed.
SbrHeader () This element contains SBR header data with SBR configuration parameters, which typically do not change over a continuous stream of data.
SBR to USAC payload In USAC the SBR payload is transmitted in UsacSbrData (), which is an integral part of each single channel element or channel pair element. UsacSbrData () immediately follows UsacCoreCoderData (). There is no SBR payload for LFE channels.
numSlots The number of time bands in an Mps212Data structure.
Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a function of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or function of a corresponding apparatus.
Depending on certain deployment requirements, the applications of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a floppy disk. a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored on it, which cooperate (or can cooperate) with a programmable computer system so that the respective method is carried out.
Some applications according to the invention comprise a non-transitory data carrier having electronically readable control signals, which can cooperate with a programmable computer system, so that one of the methods described here is performed.
The encoded audio signal can be transmitted via a wired or wireless transmission medium or can be stored on a machine-readable medium or on a non-transitory storage medium.
Generally, the applications of the present invention can be implemented as a computer program product with a product code, the product code being operative to perform one of the methods when the computer program product is run on a computer. The product code can, for example, be stored on a machine-readable medium.
Other applications include the computer program to perform one of the methods described here, stored on a machine-readable medium.
In other words, an application of the inventive method is, therefore, a computer program having a product code to perform one of the methods described here, when the computer program runs on a computer.
Another application of the inventive method is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded on it, the computer program for carrying out one of the methods described here.
Another application of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program to perform one of the methods described here. The data stream or signal sequence can, for example, be configured to be transferred over a data communication connection, for example, over the Internet.
An additional application comprises a processing medium, for example, a computer, or a programmable logic device, configured or adapted to perform one of the methods described here.
An additional application comprises a computer having the computer program installed on it to perform one of the methods described here.
5 In some applications, a programmable logic device (for example, an arrangement of programmable logic gates) can be used to perform some or all of the functionality of the methods described here. In some applications, an arrangement of programmable logic gates can cooperate with a microprocessor to perform one of the methods described here.
Generally, the methods are preferably performed by any hardware device.
The applications described above are merely illustrative for the principles of the present invention. It should be understood that changes and variations in the arrangements and details described here will be apparent to other persons skilled in the art.
It is the intention, therefore, to be limited only by the scope of the patent pending claims and not by the specific details presented in the form of description and explanation of the applications contained herein.

权利要求:
Claims (25)
[1]
1. Continuous flow of data comprising a configuration block (28) and a sequence of structures (20) respectively representing the consecutive periods of time 5 (18) of an audio content (10), characterized by the sequence of structures (20) be a composition of N sequences of the elements of the structure (22) with each element of the structure (22) being of a respective type of a plurality of types of element so that each structure (20) comprises an element of the structure (22) outside of the N sequences of elements of the structure (22) f respectively, and for each sequence of elements of the structure (22), the elements of the structure (22) are of the same element type with respect to each other, in which the configuration block ( 28) comprises at least one of the sequences of elements of the structure (22), a standard payload length information (60) in a standard payload length, and in which each element of the structure (22) of at least least one of the strings he structure (22) I comprises length information (58) comprising, for at least a subset of the structure elements (22) of at least one of the sequences of structure elements (22), an indicator of the length of the structure standard payload (64) followed, if the standard payload length indicator (64) is not defined, by a payload length value (66), where any element of the structure of at least one of the strings of structure elements (22), the standard extension payload length indicator (64) of which it is defined, has the standard payload length, and any structure element of at least one of the sequence elements of the structure (22), the payload length indicator of the standard extension (64) of which it is not defined, has a payload length corresponding to the payload length value (66).
[2]
2. Continuous data flow according to claim 1, characterized in that the configuration block (28) comprises a field (50) indicating a number of elements N, and a portion of the type indication syntax (52) indicating, for each position of the element in a sequence of N positions of the element, an element type were of a plurality of element types; wherein each element of the structure is of the type of element indicated by the portion of the type indication syntax (52), for the respective position of the element in which the respective element of the structure (22) is positioned within the sequence of N elements of the structure of the respective structure (20) in the continuous data stream (12).
[3]
Continuous flow of data according to claim 2, characterized in that the portion of the type indication syntax (52) comprises a sequence of N syntax elements (54) with each syntax element (54) indicating the type of element for the respective position of the element in which the respective syntax element (54) is positioned within the type indication syntax portion (52).
[4]
4. Continuous data flow according to any one of claims 1 to 3, characterized in that the configuration block (28) comprises a configuration element (56) per sequence of elements of the structure (22), comprising the configuration information for the type of element the structure elements of the respective sequence of structure elements are from.
[5]
5. Continuous data flow according to claim 4, characterized in that the portion of the indication syntax 10 of the type (52) comprises a sequence of N syntax elements (54) with each syntax element (54) indicating the type of element for the respective position of the element in which the respective syntax element (54) is positioned within the type indication syntax portion (52), and the configuration elements (56) and syntax elements are arranged in the continuous stream of data alternately.
[6]
6. Continuous data flow according to claim 5 or 6, characterized in that, for each element of the structure (22) of at least one sequence of elements of the structure (22), the length information (58) comprises one present extension payload indicator (70), where any element of the structure (22b), the present extension payload indicator (70) of the length information (58) of which is not defined, merely consists of the present indicator payload extension (70), and the length information (58) of any element of the structure (22b), the present payload data indicator (70) of the length information (58) from which it is defined, still comprises the standard payload length indicator (64) followed, if the standard payload length indicator (64) is not defined, by the value of the payload length (66).
[7]
7. Continuous data flow according to any one of claims 1 to 6, characterized in that the configuration block (28) comprises, for at least one of the sequences of elements of the structure (22), a configuration element (56 ) comprising the configuration information, wherein the configuration information comprises an extension element type field (72) indicating a payload data type out of a plurality of payload data types, where the plurality of payload data types comprise a type of information on the multichannel side and a type of information on the multi-object coding side, in which the configuration information, the field of the type of extension element (72) of which indicates the information on the multichannel side, it still comprises the configuration data of the information on the multichannel side (74), and the field configuration information of the type of extension element (72) of which indicates the type of information on the multi-object side, still c comprise the configuration data of the information on the multi-object side (74), and the elements of the structure (22b) of at least one sequence of elements of the structure (22), carry the payload data of the data type of the payload indicated by the extension element type field (72) of the configuration element configuration information for the respective sequence of structure elements.
[8]
8. Decoder for decoding a continuous data stream (12) comprising a configuration block (28)
and a sequence of structures (20) respectively representing the consecutive time periods of an audio content (10),
characterized by the sequence of structures (20) being a composition of N sequences of elements of the structure (22) with each element
5 of the structure (22) being of a respective type of a plurality of element types so that each structure (20) comprises an element of the structure (22) outside the N sequences of elements of the structure (22), respectively, and for each sequence of elements of the structure (22), the elements of the structure (22) are of the same element type with respect to each other,
wherein the decoder is configured to analyze the continuous data stream (12) and reconstruct the audio content based on a subset of the strings of the structure elements and stops with respect to at least one of the strings of the structure elements (22) , not belonging to the subset of the sequences of elements of the structure,
read from the configuration block (28) for at least one of the sequences of structure elements (22), a standard payload length information (60) in a standard payload length, and for each structure element ( 22) of at least one of the sequences of elements of the structure (22), reading a length information from the continuous data stream (12), reading the length information (58) comprising, for at least a subset of the elements of the structure (22) of at least one of the sequences of elements of the structure (22), reading a standard payload length indicator (64) followed, if the standard payload length indicator (64) is not defined , by reading a payload length value (66), skip, when analyzing the continuous data flow 5 (12), any element of the structure of at least one of the sequences of elements of the structure (22), payload length indicator of the standard extension (64) from which it is defined gone, using the length of the standard payload as the length of the jump interval, and any element of the structure of at least one of the sequences of elements of the structure (22), the standard length payload length indicator (64) of which it is not defined, using a payload length corresponding to the payload length value (66) as the length of the jump interval.
[9]
9. Decoder according to claim 8, characterized in that the decoder is configured to, when reading the configuration block (28), read a field (50) indicating the number of elements N, and a portion of the type indication syntax ( 52) indicating, for each position of the element of a sequence of N positions of the element, a type of element out of a plurality of type of elements, in which the decoder is configured to decode each structure (20) by decoding each element of the structure (22) according to the type of element indicated, by the portion of the type indication syntax, for the respective position of the element in which the respective element of the structure is positioned within the sequence of N elements of the structure (22) of the respective structure (20) in the continuous data stream (12).
7/17
[10]
Decoder according to claim 9, characterized in that the decoder is configured to read a sequence of N syntax elements (54) from the type indication syntax portion (52), with each element indicating the type of element for the respective position of the element in which the respective syntax element is positioned in the sequence of N syntax elements.
[11]
Decoder according to any one of claims 8 to 10, characterized in that the decoder is configured to read a configuration element (56) for each sequence of elements of the configuration block structure (28), with each configuration element comprising configuration information for the respective sequence of structure elements, where the decoder is configured to, in the reconstruction of the audio content based on a subset of the structure element sequences, decode each structure element (22) of the sequence subset of structure elements using the configuration information of the respective configuration element.
[12]
Decoder according to claim 11, characterized in that the portion of the type indication syntax (52) comprises a sequence of N syntax elements (54) with each syntax element (54) indicating the type of element for the respective position of the element in which the respective syntax element (54) is positioned within the type indication syntax portion (52), the decoder is configured to read the configuration elements (56) and syntax elements (54) from the stream data stream (12) alternately.
[13]
13. Decoder according to any one of claims 8 to 12, characterized in that the decoder is configured to, when reading the length information (58) of any element of the structure 5 of at least one sequence of elements of the structure, read a present extension payload indicator (70) from the continuous data stream (12), check whether the present extension payload indicator (70) is set, and if the present extension payload indicator (70) ) is not defined, stop reading the respective element of the structure (22b) and continue with the reading of another element of the structure (22) of a current structure (20) or an element of the structure of a subsequent structure (20), and if the present payload data indicator (70) is set, continue reading the standard payload length indicator (64) followed, if the standard payload length indicator (64) is not set, the value payload length (6 6) the continuous data flow (12), and with the jump.
[14]
Decoder according to any one of claims 8 to 13, characterized in that the decoder is configured to, when reading the standard payload length information (60), read a present indicator of the standard continuous payload length of the data (12), check that the present standard payload length indicator is defined, if the present standard payload length indicator is not defined, set the standard extension payload length to zero, and that the present indicator of the default payload length is defined, explicitly read the payload length of the standard data stream extension.
[15]
15. Decoder according to any one of the claims 8 to 14, characterized in that the decoder is configured to, when reading the configuration block (28), for each sequence of elements of the structure of at least one sequence of elements of the structure, read a configuration element (56) comprising configuration information for a type of extension element of the data stream (12), wherein the configuration information comprises a field of the type of extension element (72) indicating a payload data type out of a plurality of payload data types.
[16]
16. Decoder according to claim 15, characterized by the plurality of data types of the payload comprising a type of information on the multichannel side and a type of information on the multi-object coding side, the decoder being configured to, when reading the configuration block (28), for each of at least one sequence of elements of the structure, if the extension element type field (72) indicates the type of information on the multichannel side, read the information configuration data on the multi channel side (7 4) as part of the data flow configuration information (12), and if the extension element type field (72) indicates the type of information on the multi-object side, read the data from configuration of the multi-object side information (74) as part of the data flow configuration information, and the decoder is configured to, when decoding each structure, decode the elements of the structure of any one of at least one sequence from elemen of the structure, in which the type of extension element of the configuration element (56) indicates the type of information on the multichannel side, configuring a multichannel decoder (44e) using the configuration data of the multichannel side (74) and feeding the multichannel decoder then configured (44e) with payload data (68) of the structure elements (22b) of the respective sequence of structure elements as information on the multichannel side, and decode the structure elements of any of at least one sequence of structure elements, for which the type of extension element of the configuration element (56) indicates the type of information on the multi-object side, configuring a multi-object decoder (44d) using the configuration data of the information of the multi-object side (7 4) and feeding the multi-object decoder then configured (4 4d) with payload data (68) of the structure elements (22) of the respective element sequence of the structure.
[17]
A decoder according to claim 15 or 16, characterized in that the decoder is configured to read, for any one of at least one sequence of elements of the structure, a field of the length of the configuration data (7 6) of the continuous stream. data (12) as part of the configuration element configuration information for the respective sequence of structure elements,
verify that the payload data type
5 indicated by the extension element type field (72) of the configuration element configuration information for the respective sequence of structure elements, belongs to a predetermined set of payload data types being a subset of a plurality of types payload data,
if the payload data type indicated by the extension element type field (72) of the configuration element configuration information for the respective structure element sequence, belongs to the predetermined set of payload data types,
reading the configuration data dependent on the payload data (74) as part of the configuration element configuration information for the respective sequence of elements of the data flow structure (12), and decoding structure elements of the respective element sequence of the structure in the structures (20),
using the configuration data dependent on the payload data (74), and if the payload data type indicated by the extension element type field (72) of the configuration element configuration information for the respective element sequence structure, does not belong to the predetermined set of payload data types,
skip the configuration data dependent on the payload data (7 4) using the length of the configuration data, and skip the structure elements of the respective sequence of structure elements in the structures (20) using the length information (58) in it.
[18]
18. Decoder according to any one of claims 8 to 17, characterized in that the decoder is configured to, when reading the configuration block (28), for each of at least one sequence of elements of the structure, read an element configuration (56) comprising configuration information for a type of data stream extension element (12), wherein the configuration information comprises a fragmentation usage indicator (78), and the decoder is configured to, at the read structure elements (22) from any sequence of structure elements in which the configuration element fragmentation usage indicator (78) is set, read fragmentation information from the data stream, and use fragmentation information to put the payload data of these consecutive structure elements together.
[19]
19. Decoder according to any one of claims 8 to 18, characterized in that the decoder is configured so that the decoder reconstructs an audio signal from the structure elements (22) of one of the subset of the structure element sequences that are of a single channel element type.
[20]
20. Decoder according to any of the claims from 8 to 19, characterized in that the decoder is configured so that the decoder reconstructs an audio signal from the elements of the structure (22) of one of the subsets of the element sequences of the structure that are of a channel pair element type.
[21]
21. Decoder according to any one of claims 8 to 20, characterized in that the decoder is configured to use the same variable length code to read the length information (80), the extension element type field (72), and the configuration data length field (76).
[22]
22. Encoder for encoding audio content in a continuous data stream, the decoder characterized by being configured to: encode the consecutive time periods (18) of the audio content (10) in a sequence of structures (20) respectively representing the consecutive periods of time (18) of the audio content (10), so that the structure sequence (20) is a composition of N sequences of structure elements (22) with each element of the structure (22) being of a respective type of a plurality of element types so that each structure (20) comprises an element of the structure (22) outside the N sequences of elements of the structure (22), respectively, and for each sequence of elements of the structure (22 ), the elements of the structure (22) are of the same element type with respect to each other, encoding to the continuous data flow (12) a configuration block (28) comprising, for at least one of the sequences of elements of the structure ( 22) , a standard payload length information (60) into a standard payload length, and encode each element of the structure (22) of at least one of the sequences of elements of the structure (22) in the continuous data stream (12) so that it comprises length information (58) comprising at least a subset of the elements of the structure (22) of at least one of the sequences of elements of the structure (22), an indicator of the length standard payload (64) followed, if the standard payload length indicator (64) is not defined, by a value of the payload length (66), and that any element of the structure of at least one of the strings of structure elements (22), the payload length indicator of the standard extension (64) of which it is defined, has the length of the standard payload, and any element of the structure of at least one of the strings of elements structure (22), the length indicator payload of the standard extension (64) of which it is not defined, has a payload length corresponding to the value of the payload length (66).
[23]
23. Method for decoding a continuous data stream (12) comprising a configuration block (28) and a sequence of structures (20) respectively representing consecutive periods of time of an audio content, in which the sequence of structures (20) is a composition of N sequences of elements of the structure (22) with each element of the structure (22)
being of a respective type of a plurality of element types so that each structure (20) comprises an element of the structure (22) outside the N sequences of elements of the structure
5 (22), respectively, and for each sequence of elements of the structure (22), the elements of the structure (22) are of the same element type with respect to each other, characterized by the method comprising the analysis of the continuous data flow (12 ) and reconstruct the audio content based on a subset of the strings of elements of the structure and, with respect to at least
a structure of the sequences of elements of the structure (22), not belonging to the subset of the sequences of elements of the structure,
read from the configuration block (28), for at least one of the strings of structure elements (22), a standard payload length information (60) in a standard payload length, and for each structure element (22 ) of at least one of the sequences of elements of the structure (22), reading a length information from the continuous data stream (12), the reading of the length information comprising, for at least a subset of the elements of the structure ( 22) of at least one of the sequences of structure elements (22), reading a standard payload length indicator (64) followed, if the standard payload length indicator (64) is not defined, by reading of a payload length value
(66),
skip when analyzing a continuous stream of data
(12), any element of the structure of at least one of the sequences of elements of the structure (22), the payload length indicator of the standard extension (64) of which it is defined, using the standard payload length as 5 length of the jump interval, and any element of the structure of at least one of the sequences of elements of the structure (22), the payload length indicator of the standard extension (64) of which is not defined, using a length payload length corresponding to the payload length value (66) as the length of the jump interval.
[24]
2 4. Method for encoding an audio content in a continuous data stream, the method characterized by encoding the consecutive periods of time (18) of the audio content (10) in a sequence of structures (20) respectively representing the periods consecutive times (18) of the audio content (10), so that the sequence of frames (20) is a composition of N sequences of frame elements (22) with each frame element (22) being of a respective type of a plurality of element types so that each structure (20) comprises an element of the structure (22) outside the N sequences of elements of the structure (22), respectively, and for each sequence of elements of the structure (22), the elements of the structure (22) are of the same type of element with relation to each other, encoding to the continuous data flow (12) a configuration block (28) comprising, for at least one of the sequences of elements of the structure (22), an information of standard payload length (60) into a standard payload length, and code each element of the structure (22) for at least one of the
17/17 sequences of structure elements (22) in the continuous data stream (12) so that it comprises length information (58) comprising, for at least a subset of the structure elements (22) of at least , one of the sequences of 5 structure elements (22), a standard payload length indicator (64) followed, if the standard payload length indicator (64) is not defined, by a payload length value (66), and that any element of the structure of at least one of the sequences of elements of the structure (22), the payload length indicator of the standard extension (64) of which it is defined, has the payload length standard, and any element of the structure of at least one of the sequences of elements of the structure (22), the payload length indicator of the standard extension (64) of which is not defined, has a payload length corresponding to the payload length value (66).
[25]
25. Computer program for carrying out, the method according to claim 23 or claim 24, characterized in that it is executed on a computer.

类似技术:

公开号 | 公开日 | 专利标题

BR112013023949A2|2020-11-10|transmission length of frame element in audio coding

同族专利:

公开号 | 公开日

AU2012230442A1|2013-10-31|

KR20160056953A|2016-05-20|

RU2013146528A|2015-04-27|

JP2014510310A|2014-04-24|

US20140016787A1|2014-01-16|

TWI488178B|2015-06-11|

KR101767175B1|2017-08-10|

US9524722B2|2016-12-20|

RU2589399C2|2016-07-10|

US9779737B2|2017-10-03|

CN107342091B|2021-06-15|

KR101742135B1|2017-05-31|

AU2012230440C1|2016-09-08|

EP2686848A1|2014-01-22|

RU2013146526A|2015-04-27|

EP2686847A1|2014-01-22|

KR101748760B1|2017-06-19|

JP5820487B2|2015-11-24|

CA2830631C|2016-08-30|

AU2012230440A1|2013-10-31|

AU2016203416A1|2016-06-23|

KR20140000336A|2014-01-02|

CA2830633C|2017-11-07|

CA2830439A1|2012-09-27|

US20140019146A1|2014-01-16|

CN103703511A|2014-04-02|

AU2016203416B2|2017-12-14|

KR101854300B1|2018-05-03|

KR101712470B1|2017-03-22|

CA2830633A1|2012-09-27|

MX2013010537A|2014-03-21|

US9972331B2|2018-05-15|

AU2012230415A1|2013-10-31|

US20180233155A1|2018-08-16|

TW201303853A|2013-01-16|

CN103620679A|2014-03-05|

AU2016203419A1|2016-06-16|

KR101742136B1|2017-05-31|

WO2012126893A1|2012-09-27|

JP6007196B2|2016-10-12|

AR085445A1|2013-10-02|

CA2830631A1|2012-09-27|

KR20160056952A|2016-05-20|

AR085446A1|2013-10-02|

MY167957A|2018-10-08|

CN103703511B|2017-08-22|

AU2016203417B2|2017-04-27|

SG193525A1|2013-10-30|

AU2012230442B2|2016-02-25|

CN107516532A|2017-12-26|

KR20140018929A|2014-02-13|

CN103562994A|2014-02-05|

JP5805796B2|2015-11-10|

CA2830439C|2016-10-04|

MX2013010536A|2014-03-21|

KR20160056328A|2016-05-19|

JP2014509754A|2014-04-21|

RU2013146530A|2015-04-27|

AU2016203419B2|2017-12-14|

CN103562994B|2016-08-17|

WO2012126866A1|2012-09-27|

WO2012126891A1|2012-09-27|

US20140016785A1|2014-01-16|

CN107342091A|2017-11-10|

TW201246190A|2012-11-16|

CN103620679B|2017-07-04|

CN107516532B|2020-11-06|

TW201243827A|2012-11-01|

TWI571863B|2017-02-21|

EP2686849A1|2014-01-22|

AR088777A1|2014-07-10|

SG194199A1|2013-12-30|

KR20160058191A|2016-05-24|

US9773503B2|2017-09-26|

KR101748756B1|2017-06-19|

JP2014512020A|2014-05-19|

US20170270938A1|2017-09-21|

HK1245491A1|2018-08-24|

US10290306B2|2019-05-14|

MY163427A|2017-09-15|

MX2013010535A|2014-03-12|

AU2012230442A8|2013-11-21|

AU2012230415B2|2015-10-29|

AU2012230440B2|2016-02-25|

TWI480860B|2015-04-11|

RU2571388C2|2015-12-20|

KR20140000337A|2014-01-02|

AU2016203417A1|2016-06-23|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US6256487B1|1998-09-01|2001-07-03|Telefonaktiebolaget Lm Ericsson |Multiple mode transmitter using multiple speech/channel coding modes wherein the coding mode is conveyed to the receiver with the transmitted signal|

US7266501B2|2000-03-02|2007-09-04|Akiba Electronics Institute Llc|Method and apparatus for accommodating primary content audio and secondary content remaining audio capability in the digital audio production process|

FI120125B|2000-08-21|2009-06-30|Nokia Corp|Image Coding|

KR20040036948A|2001-09-18|2004-05-03|코닌클리케 필립스 일렉트로닉스 엔.브이.|Video coding and decoding method, and corresponding signal|

US7054807B2|2002-11-08|2006-05-30|Motorola, Inc.|Optimizing encoder for efficiently determining analysis-by-synthesis codebook-related parameters|

EP1427252A1|2002-12-02|2004-06-09|Deutsche Thomson-Brandt Gmbh|Method and apparatus for processing audio signals from a bitstream|

WO2004059643A1|2002-12-28|2004-07-15|Samsung Electronics Co., Ltd.|Method and apparatus for mixing audio stream and information storage medium|

DE10345996A1|2003-10-02|2005-04-28|Fraunhofer Ges Forschung|Apparatus and method for processing at least two input values|

US7447317B2|2003-10-02|2008-11-04|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V|Compatible multi-channel coding/decoding by weighting the downmix channel|

US7684521B2|2004-02-04|2010-03-23|Broadcom Corporation|Apparatus and method for hybrid decoding|

US7516064B2|2004-02-19|2009-04-07|Dolby Laboratories Licensing Corporation|Adaptive hybrid transform for signal analysis and synthesis|

US7930184B2|2004-08-04|2011-04-19|Dts, Inc.|Multi-channel audio coding/decoding of random access points and transients|

US8131134B2|2004-04-14|2012-03-06|Microsoft Corporation|Digital media universal elementary stream|

AT457512T|2004-05-17|2010-02-15|Nokia Corp|AUDIOCODING WITH DIFFERENT CODING FRAME LENGTHS|

DE102004043521A1|2004-09-08|2006-03-23|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Device and method for generating a multi-channel signal or a parameter data set|

SE0402650D0|2004-11-02|2004-11-02|Coding Tech Ab|Improved parametric stereo compatible coding or spatial audio|

EP1866912B1|2005-03-30|2010-07-07|Koninklijke Philips Electronics N.V.|Multi-channel audio coding|

DE102005014477A1|2005-03-30|2006-10-12|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a data stream and generating a multi-channel representation|

US8577686B2|2005-05-26|2013-11-05|Lg Electronics Inc.|Method and apparatus for decoding an audio signal|

JP4988716B2|2005-05-26|2012-08-01|エルジーエレクトロニクスインコーポレイティド|Audio signal decoding method and apparatus|

JP5118022B2|2005-05-26|2013-01-16|エルジーエレクトロニクスインコーポレイティド|Audio signal encoding / decoding method and encoding / decoding device|

US8180631B2|2005-07-11|2012-05-15|Lg Electronics Inc.|Apparatus and method of processing an audio signal, utilizing a unique offset associated with each coded-coefficient|

RU2380767C2|2005-09-14|2010-01-27|ЭлДжи ЭЛЕКТРОНИКС ИНК.|Method and device for audio signal decoding|

EP2555187B1|2005-10-12|2016-12-07|Samsung Electronics Co., Ltd.|Method and apparatus for encoding/decoding audio data and extension data|

EP1987595B1|2006-02-23|2012-08-15|LG Electronics Inc.|Method and apparatus for processing an audio signal|

EP2100297A4|2006-09-29|2011-07-27|Korea Electronics Telecomm|Apparatus and method for coding and decoding multi-object audio signal with various channel|

MX2009003564A|2006-10-16|2009-05-28|Fraunhofer Ges Forschung|Apparatus and method for multi -channel parameter transformation.|

DE102006049154B4|2006-10-18|2009-07-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Coding of an information signal|

CN101197703B|2006-12-08|2011-05-04|华为技术有限公司|Method, system and equipment for managing Zigbee network|

DE102007007830A1|2007-02-16|2008-08-21|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a data stream and apparatus and method for reading a data stream|

DE102007018484B4|2007-03-20|2009-06-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for transmitting a sequence of data packets and decoder and apparatus for decoding a sequence of data packets|

EP2137973B1|2007-04-12|2019-05-01|InterDigital VC Holdings, Inc.|Methods and apparatus for video usability information for scalable video coding |

US7778839B2|2007-04-27|2010-08-17|Sony Ericsson Mobile Communications Ab|Method and apparatus for processing encoded audio data|

KR20090004778A|2007-07-05|2009-01-12|엘지전자 주식회사|Method for processing an audio signal and apparatus for implementing the same|

EP2242048B1|2008-01-09|2017-06-14|LG Electronics Inc.|Method and apparatus for identifying frame type|

KR101461685B1|2008-03-31|2014-11-19|한국전자통신연구원|Method and apparatus for generating side information bitstream of multi object audio signal|

EP2352147B9|2008-07-11|2014-04-23|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|An apparatus and a method for encoding an audio signal|

KR101706009B1|2008-07-11|2017-02-22|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|Audio encoder, audio decoder, method for encoding and decoding an audio signal. audio stream and computer program|

KR101456641B1|2008-07-11|2014-11-04|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|audio encoder and audio decoder|

MY154452A|2008-07-11|2015-06-15|Fraunhofer Ges Forschung|An apparatus and a method for decoding an encoded audio signal|

PL2346029T3|2008-07-11|2013-11-29|Fraunhofer Ges Forschung|Audio encoder, method for encoding an audio signal and corresponding computer program|

WO2010036062A2|2008-09-25|2010-04-01|Lg Electronics Inc.|A method and an apparatus for processing a signal|

KR101108060B1|2008-09-25|2012-01-25|엘지전자 주식회사|A method and an apparatus for processing a signal|

US8346379B2|2008-09-25|2013-01-01|Lg Electronics Inc.|Method and an apparatus for processing a signal|

EP2182513B1|2008-11-04|2013-03-20|Lg Electronics Inc.|An apparatus for processing an audio signal and method thereof|

KR101315617B1|2008-11-26|2013-10-08|광운대학교 산학협력단|Unified speech/audio coder processing windows sequence based mode switching|

CN101751925B|2008-12-10|2011-12-21|华为技术有限公司|Tone decoding method and device|

KR101622950B1|2009-01-28|2016-05-23|삼성전자주식회사|Method of coding/decoding audio signal and apparatus for enabling the method|

KR101316979B1|2009-01-28|2013-10-11|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베.|Audio Coding|

WO2010090427A2|2009-02-03|2010-08-12|삼성전자주식회사|Audio signal encoding and decoding method, and apparatus for same|

KR20100090962A|2009-02-09|2010-08-18|주식회사 코아로직|Multi-channel audio decoder, transceiver comprising the same decoder, and method for decoding multi-channel audio|

US8411746B2|2009-06-12|2013-04-02|Qualcomm Incorporated|Multiview video coding over MPEG-2 systems|

US8780999B2|2009-06-12|2014-07-15|Qualcomm Incorporated|Assembling multiview video coding sub-BITSTREAMS in MPEG-2 systems|

ES2673637T3|2009-06-23|2018-06-25|Voiceage Corporation|Prospective cancellation of time domain overlap with weighted or original signal domain application|

WO2011010876A2|2009-07-24|2011-01-27|한국전자통신연구원|Method and apparatus for window processing for interconnecting between an mdct frame and a heterogeneous frame, and encoding/decoding apparatus and method using same|CN100385007C|2006-01-18|2008-04-30|江南大学|Process for preparing -mandelic acid by microbial asymmetric resolution|

CN103109318B|2010-07-08|2015-08-05|弗兰霍菲尔运输应用研究公司|Utilize the scrambler of forward direction aliasing technology for eliminating|

BR112013008463B1|2010-10-06|2021-06-01|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschubg E.V.|APPARATUS AND METHOD TO PROCESS AN AUDIO SIGNAL AND TO PROVIDE A GREATER TIME GRANULARITY FOR A COMBINED UNIFIED AUDIO AND SPEECH CODEC |

EP2777042B1|2011-11-11|2019-08-14|Dolby International AB|Upsampling using oversampled sbr|

WO2014126688A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Methods for audio signal transient detection and decorrelation control|

JP6046274B2|2013-02-14|2016-12-14|ドルビーラボラトリーズライセンシングコーポレイション|Method for controlling inter-channel coherence of an up-mixed audio signal|

EP2959479B1|2013-02-21|2019-07-03|Dolby International AB|Methods for parametric multi-channel encoding|

CN103336747B|2013-07-05|2015-09-09|哈尔滨工业大学|The input of cpci bus digital quantity and the configurable driver of output switch parameter and driving method under vxworks operating system|

EP2830058A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Frequency-domain audio coding supporting transform length switching|

EP2830053A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal|

TWI634547B|2013-09-12|2018-09-01|瑞典商杜比國際公司|Decoding method, decoding device, encoding method, and encoding device in multichannel audio system comprising at least four audio channels, and computer program product comprising computer-readable medium|

JP6531103B2|2013-09-12|2019-06-12|ドルビー・インターナショナル・アーベー|QMF based processing data time alignment|

EP2928216A1|2014-03-26|2015-10-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for screen related audio object remapping|

US9847804B2|2014-04-30|2017-12-19|Skyworks Solutions, Inc.|Bypass path loss reduction|

EP3258467B1|2015-02-10|2019-09-18|Sony Corporation|Transmission and reception of audio streams|

EP3067887A1|2015-03-09|2016-09-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal|

CA2978835C|2015-03-09|2021-01-19|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Fragment-aligned audio coding|

EP3312834A1|2015-06-17|2018-04-25|Samsung Electronics Co., Ltd.|Method and device for processing internal channels for low complexity format conversion|

US10490197B2|2015-06-17|2019-11-26|Samsung Electronics Co., Ltd.|Method and device for processing internal channels for low complexity format conversion|

US10497379B2|2015-06-17|2019-12-03|Samsung Electronics Co., Ltd.|Method and device for processing internal channels for low complexity format conversion|

CN108028988B|2015-06-17|2020-07-03|三星电子株式会社|Apparatus and method for processing internal channel of low complexity format conversion|

US10008214B2|2015-09-11|2018-06-26|Electronics And Telecommunications Research Institute|USAC audio signal encoding/decoding apparatus and method for digital radio services|

TWI673708B|2017-01-10|2019-10-01|弗勞恩霍夫爾協會|Audio decoder, audio encoder, method for providing a decoded audio signal, method for providing an encoded audio signal, audio stream, audio stream provider and computer program using a stream identifier|

US10224045B2|2017-05-11|2019-03-05|Qualcomm Incorporated|Stereo parameters for stereo decoding|

CA3071208A1|2017-07-28|2019-01-31|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus for encoding or decoding an encoded multichannel signal using a filling signal generated by a broad band filter|

EP3483883A1|2017-11-10|2019-05-15|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio coding and decoding with selective postfiltering|

WO2019091573A1|2017-11-10|2019-05-16|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for encoding and decoding an audio signal using downsampling or interpolation of scale parameters|

EP3483880A1|2017-11-10|2019-05-15|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Temporal noise shaping|

US11032580B2|2017-12-18|2021-06-08|Dish Network L.L.C.|Systems and methods for facilitating a personalized viewing experience|

US10365885B1|2018-02-21|2019-07-30|Sling Media Pvt. Ltd.|Systems and methods for composition of audio content from multi-object audio|

CN110505425B|2018-05-18|2021-12-24|杭州海康威视数字技术股份有限公司|Decoding method, decoding device, electronic equipment and readable storage medium|

US11081116B2|2018-07-03|2021-08-03|Qualcomm Incorporated|Embedding enhanced audio transports in backward compatible audio bitstreams|

CN109448741B|2018-11-22|2021-05-11|广州广晟数码技术有限公司|3D audio coding and decoding method and device|

CN112422987B|2020-10-26|2022-02-22|眸芯科技（上海）有限公司|Entropy decoding hardware parallel computing method and application suitable for AVC|

法律状态:
2017-07-04| B15I| Others concerning applications: loss of priority|

2017-07-18| B150| Others concerning applications: publication cancelled [chapter 15.30 patent gazette]|

2020-11-24| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2020-12-15| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-04-13| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|

2021-07-27| B09B| Patent application refused [chapter 9.2 patent gazette]|

2021-10-05| B12B| Appeal against refusal [chapter 12.2 patent gazette]|

2021-12-07| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201161454121P| true| 2011-03-18|2011-03-18|

US61/454,121|2011-03-18|

PCT/EP2012/054823|WO2012126893A1|2011-03-18|2012-03-19|Frame element length transmission in audio coding|

[返回顶部]